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Abstract 


In  many  important  applications,  a  collection  of  mutually  distrustful  parties  must 
share  information,  without  compromising  their  privacy.  Currently,  these  applica¬ 
tions  are  often  performed  by  using  some  form  of  a  trusted  third  party  (TTP);  this 
TTP  receives  all  players’  inputs,  computes  the  desired  function,  and  returns  the 
result.  However,  the  level  of  trust  that  must  be  placed  in  such  a  TTP  is  often 
inadvisable,  undesirable,  or  even  illegal.  In  order  to  make  many  applications  prac¬ 
tical  and  secure,  we  must  remove  the  TTP,  replacing  it  with  efficient  protocols  for 
privacy-preserving  distributed  information  sharing.  Thus,  in  this  thesis  we  explore 
techniques  for  privacy-preserving  distributed  information  sharing  that  are  efficient, 
secure,  and  applicable  to  many  situations. 

As  an  example  of  privacy-preserving  information  sharing,  we  propose  efficient 
techniques  for  privacy-preserving  operations  on  multisets.  By  building  a  frame¬ 
work  of  multiset  operations,  employing  the  mathematical  properties  of  polynomials, 
we  design  efficient,  secure,  and  composable  methods  to  enable  privacy- preserving 
computation  of  the  union,  intersection,  and  element  reduction  operations.  We 
apply  these  techniques  to  a  wide  range  of  practical  problems,  including  the  Set- 
Intersection,  Over-Threshold  Set-Union,  Cardinality  Set-Intersection,  and  Thresh¬ 
old  Set-Union  problems.  Additionally,  we  address  the  problem  of  determining  Subset 
relations,  and  even  use  our  techniques  to  evaluate  CNF  boolean  formulae. 

We  then  examine  the  problem  of  hot  item  identification  and  publication,  a  prob¬ 
lem  closely  related  to  Over-Threshold  Set-Union.  Many  applications  of  this  problem 
require  greater  efficiency  and  robustness  than  any  previously-designed  secure  pro¬ 
tocols  for  this  problem.  In  order  to  achieve  sufficiently  efficient  protocols  for  these 
problems,  we  define  two  new  privacy  properties:  owner  privacy  and  data  privacy. 
Protocols  that  achieve  these  properties  protect  the  privacy  of  each  player’s  personal 
input  set,  as  well  as  protecting  information  about  the  players’  collective  inputs. 
By  designing  our  protocols  to  achieve  owner  and  data  privacy,  we  are  able  to  sig¬ 
nificantly  increase  efficiency  over  our  privacy-preserving  set  operations,  while  still 
protecting  the  privacy  of  participants.  In  addition,  our  protocols  are  extremely 
flexible  -  nodes  can  join  and  leave  at  any  time. 
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Chapter  1 


Introduction 


As  computer  system  capacity  has  increased,  organizations  and  individuals  have  collected  greater 
and  greater  amounts  of  data.  Developments  in  data  mining  have  increased  the  utility  of  this 
data;  a  data  holder  can  now  discover  hidden  trends  and  cheating  customers.  However,  a  great 
deal  of  useful  data  cannot  be  computed  from  one  party’s  data  alone.  When  multiple  parties 
can  compare  and  share  data,  they  can  greatly  increase  the  utility  of  their  data.  For  example, 
multiple  pharmacies  might  compare  their  records  to  detect  people  filling  a  prescription  multiple 
times.  Subtle  trends  can  leave  fingerprints  in  many  data  sets,  hiding  them  from  detection  unless 
multiple  data  sets  are  examined.  The  early  stages  of  a  disease  epidemic  might  cause  an  increased 
rate  of  absenteeism  from  work  and  school,  higher  sales  of  certain  over-the-counter  medications, 
and  many  other  small  traces.  Each  of  these  traces,  on  its  own,  might  not  be  enough  to  detect 
an  epidemic,  as  they  fluctuate  based  on  many  factors.  By  combining  many  data  sets,  we  can 
detect  increasingly  more  subtle  trends. 

Data  does  not  exist  in  a  void,  however.  To  detect  trends  that  involve  medical  data,  we  must 
use  medical  data  collected  from  individuals.  This  data  is  generally  considered  to  be  private; 
many  countries  have  strict  regulations  that  control  the  use  of  medical  and  other  personal  infor¬ 
mation.  Many  other  data  sets  are  collected  by  companies  who  are  concerned  about  preserving 
the  proprietary  value  of  their  data.  Government  data  often  has  both  privacy  and  security  re¬ 
strictions  associated  with  its  use.  Thus,  we  often  cannot  simply  combine  data  sets  held  by 
multiple  parties  to  compute  functions  over  the  combined  data.  In  the  real  world,  parties  often 
resort  to  use  of  a  trusted  third  party,  who  computes  a  fixed  function  on  all  parties’  private  in¬ 
puts,  or  forgo  the  application  altogether.  This  unconditional  trust  is  fraught  with  security  risks; 
the  trusted  party  may  be  dishonest  or  compromised,  as  it  is  an  attractive  target.  The  problem 
of  privacy-preserving  distributed  information  sharing  is  to  allow  parties  with  private  data  sets 
to  compute  these  joint  functions  without  use  of  a  trusted  party,  and  thus  achieve  many  of  the 
benefits  obtained  from  combining  the  data  sets  without  undesirably  revealing  private  data. 

Protocols  for  privacy-preserving  distributed  information  sharing  must  also  be  designed 
around  a  number  of  practical  concerns.  Many  data  sets  are  extremely  large;  protocols  that 
operate  on  large  data  sets  must  be  efficient  in  order  to  operate  in  the  real  world.  In  addition, 
we  must  be  concerned  with  robustness.  Adversaries  may  attempt  to  manipulate  the  protocol 
to  learn  private  information  or  change  the  results.  Some  adversaries  may  even  manipulate  the 
network,  causing  some  players  to  become  disconnected  from  others.  In  some  problem  scenarios, 
such  as  those  concerning  detecting  and  defending  against  network  attacks,  robustness  against 
an  unreliable  network  is  paramount. 

In  this  thesis,  we  examine  several  specific  problems  that  fall  under  the  heading  of  privacy- 
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preserving  distributed  information  sharing.  We  design  efficient  protocols  for  these  problems, 
proving  their  security  and  correctness  in  the  presence  of  attackers.  The  first  set  of  problems  we 
examine  are  those  related  to  privacy-preserving  set  and  multiset  operations.  We  then  examine 
in  depth  the  problem  of  hot  item  identification,  including  its  need  for  extreme  efficiency  and 
robustness. 


1.1  Privacy-Preserving  Set  and  Multiset  Operations 

We  design  efficient  privacy-preserving  techniques  and  protocols  for  computation  over  multisets 
by  mutually  distrustful  parties:  no  party  learns  more  information  about  other  parties’  private 
input  sets  than  what  can  be  deduced  from  the  result  of  the  computation. 

For  example,  to  determine  which  airline  passengers  appear  on  a  ‘do-not-fly’  list,  the  airline 
must  perform  a  set-intersection  operation  between  its  private  passenger  list  and  the  govern¬ 
ment’s  list.  This  is  an  example  of  the  Set-Intersection  problem.  If  a  social  services  organization 
needs  to  determine  the  list  of  people  on  welfare  who  have  cancer,  the  union  of  each  hospital’s 
lists  of  cancer  patients  must  be  calculated  (but  not  revealed),  then  an  intersection  operation 
between  the  unrevealed  list  of  cancer  patients  and  the  welfare  rolls  must  be  performed.  This 
problem  may  be  efficiently  solved  by  composition  of  our  private  union  and  set-intersection 
techniques. 

Another  example  of  the  use  of  these  techniques  is  in  privacy-preserving  distributed  network 
monitoring.  In  this  scenario,  each  node  monitors  anomalous  local  traffic,  and  a  distributed 
group  of  nodes  collectively  identify  popular  anomalous  behaviors:  behaviors  that  are  identified 
by  at  least  a  threshold  t  number  of  monitors.  This  is  an  example  of  the  Over-Threshold  Set- 
Union  problem. 

In  this  thesis,  we  propose  efficient  techniques  for  privacy- preserving  operations  on  multisets. 
By  building  a  framework  of  multiset  operations  using  polynomial  representations  and  employing 
the  mathematical  properties  of  polynomials,  we  design  efficient  methods  to  enable  privacy¬ 
preserving  computation  of  the  union,  intersection,  and  element  reduction^  multiset  operations. 

An  important  feature  of  our  privacy-preserving  multiset  operations  is  that  they  can  be 
composed,  and  thus  enable  a  wide  range  of  applications.  To  demonstrate  the  power  of  our 
techniques,  we  apply  our  operations  to  solve  specific  problems,  including  Set-Intersection,  Car¬ 
dinality  Set- Intersection,  Over-Threshold  Set-Union,  and  Threshold  Set-Union,  as  well  as  deter¬ 
mining  the  Subset  relation.  Furthermore,  we  show  that  our  techniques  can  be  used  to  efficiently 
compute  the  output  of  any  function  over  multisets  expressed  in  the  following  grammar,  where 
s  represents  any  set  held  by  some  player  and  d  >  1: 

T  ::=  s  I  Rdrf(T)  |TnT|sUT|TUs 

Note  that  any  monotonic  function  over  multisets^  can  be  expressed  using  our  grammar,  showing 
that  our  techniques  have  truly  general  applicability.  Finally,  we  show  that  our  techniques  are 
applicable  even  outside  the  realm  of  set  computation.  As  an  example,  we  describe  how  to  utilize 
our  techniques  to  efficiently  and  privately  evaluate  CNF  boolean  functions. 

Our  protocols  are  more  efficient  than  the  results  obtained  from  previous  work.  General  mul¬ 
tiparty  computation  is  the  best  previous  result  for  most  of  the  multiset  computation  problems 

^The  element  reduction  by  d,  Rdd(A),  of  a  multiset  A  is  the  multiset  composed  of  the  elemeuts  of  A  such  that 
for  every  elemeut  a  that  appears  iu  A  at  least  d'  >  d  times,  a  is  iucluded  d'  —  d  times  iu  Rdd(A). 

■^Any  fuuctiou  computed  with  ouly  iutersectiou  and  union,  without  use  of  an  inverse  operation. 
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we  address  in  this  thesis.  Only  the  private  Set-Intersection  problem  and  two-party  Cardinality 
Set-Intersection  problem  have  been  previously  studied  [1,  23].  However,  previous  work  only 
provides  protocols  for  3-or-more-party  Set-Intersection  secure  only  against  honest-but-curious 
players;  it  is  not  obvious  how  to  extend  this  work  to  achieve  security  against  malicious  players. 
Also,  previous  work  focuses  on  achieving  results  for  the  Set-Intersection  problem  in  isolation 
-  these  techniques  cannot  be  used  to  compose  set  operations.  In  contrast,  we  provide  efficient 
solutions  for  private  multi-party  Set-Intersection  secure  against  malicious  players,  and  our  mul¬ 
tiset  intersection  operator  can  be  easily  composed  with  other  operations  to  enable  a  wide  range 
of  efficient  private  computation  over  multisets.  We  compare  the  communication  complexity  of 
our  protocols  with  previous  work  and  solutions  based  on  general  multiparty  communication  in 
Table  4.  Note  that  the  techniques  utilized  to  create  the  circuits  for  the  general  solution  are  both 
complex  and  incur  very  large  constants,  on  top  of  the  constants  inherent  in  the  use  of  general 
multiparty  computation  [2];  we  thus  achieve  greater  practical  efficiency,  as  well  as  asymptotic 
efficiency. 

Our  protocols  are  provably  secure  in  the  PPT-bounded  adversary  model.  We  consider  both 
standard  adversary  models:  honest-but-curious  adversaries  (HBC)  and  malicious  adversaries. 
For  protocols  secure  in  the  HBC  model,  we  prove  that  the  information  learned  by  any  coalition 
of  honest-but-curious  players  is  indistinguishable  from  the  information  learned  in  the  ideal 
model,  where  a  trusted  third  party  (TTP)  calculates  the  function.  For  protocols  secure  in 
the  malicious  model,  we  provide  simulation  proofs  showing  that  for  any  strategy  followed  by  a 
malicious  coalition  F  in  the  real  protocol,  there  is  a  translated  strategy  they  could  follow  in  the 
ideal  model,  such  that,  to  F,  the  real  execution  is  computationally  indistinguishable  from  ideal 
execution. 


1.2  Privacy-Preserving  Distributed  Hot  Item  Identification  and 
Publication 

In  this  thesis,  we  consider  a  scenario  in  which  a  group  of  distributed  nodes,  each  holding  its 
local  data  set,  would  like  to  collectively  identify  commonly  occurring  items.  More  formally, 
a  “commonly  occurring  item”  is  one  that  appears  in  at  least  a  threshold  number  of  nodes’ 
local  data  sets.  We  call  such  items  hot  items.  The  problem  of  identifying  and  publishing 
these  hot  items  is  closely  related  to  the  Over-Threshold  Set-Union  problem  we  examine  in 
Chapter  4.  Distributed  identification  of  hot  items,  while  preserving  privacy,  is  important  for 
many  applications. 

For  example,  in  distributed  network  monitoring,  each  participant  monitors  its  local  traffic 
and  the  participants  collectively  need  to  identify  common  offenders  (IP  addresses  that  are 
flagged  as  malicious  by  multiple  sites)  and  common  alerts  (events  that  are  flagged  as  anomalous 
or  malicious  by  multiple  sites).  Identifying  common  offenders  and  alerts  is  important  to  enable 
defenses  against  wide-spread  attacks  as  well  as  reduce  the  false  positive  rate;  an  offender  or 
alert  reported  by  multiple  sites  is  more  likely  to  be  truly  malicious. 

Many  more  applications  use  hot  item  identification  for  statistics  gathering.  In  computer 
troubleshooting,  common  configurations  among  unbroken  computers  can  be  used  to  identify 
configuration  errors  and  suggest  fixes  for  troubled  computers  [25,  30,  55].  In  a  distributed 
content  delivery  network  (CDN),  distributed  identification  of  hot  pages  (web  pages  that  are 
commonly  requested  at  different  sites)  is  important  for  making  effective  caching  decisions;  hot 
pages  should  have  higher  priority  when  caching  [10]. 
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In  each  of  these  applications,  it  is  crucial  to  identify  hot  items  held  by  distributed  players. 
At  the  same  time,  each  local  data  set,  be  it  local  network  traffic,  a  computer’s  configurations, 
or  a  site’s  web  surfing  traffic,  often  contains  personal  or  security-critical  data;  thus,  we  need 
effective  methods  to  identify  hot  items  without  revealing  information  about  the  non-hot  items. 
Moreover,  in  many  distributed  applications,  some  participants  may  not  be  trustworthy,  and  may 
even  be  malicious.  Thus,  we  need  to  design  effective  methods  to  enable  distributed  privacy¬ 
preserving  hot  item  identification  in  the  presence  of  malicious  participants.  Additionally,  as 
players  may  join  and  leave  the  network  frequently,  we  must  construct  protocols  that  do  not 
require  global  knowledge  of  the  network. 

Efficient,  secure,  and  privacy- preserving  hot  item  identification  and  publication  is  a  challeng¬ 
ing  problem;  previous  solutions  are  largely  insufficient.  One  approach  is  for  every  participant 
to  send  his  data  to  a  trusted  central  authority,  who  identifies  and  announces  the  hot  items. 
In  this  approach,  all  security  and  privacy  relies  on  the  central  authority.  This  level  of  trust  is 
unacceptable  for  many  situations.  Even  if  the  authority  is  trusted,  an  attacker  may  compromise 
the  authority  and  players’  privacy  with  it.  Several  other  problems  inherent  in  centralization 
are  discussed  in  [37]. 

Another  approach  is  to  use  a  partially  homomorphic  cryptosystems,  such  as  a  distributed 
version  of  Paillier  [21,  22,  45],  or  a  secure  multi-party  computation  scheme  to  compute  the 
frequencies  of  each  item  or  to  identify  hot  items  directly.  However,  these  methods  have  signifi¬ 
cant  computation  and  communication  overhead,  including  zero-knowledge  proofs,  which  is  often 
prohibitive  for  large  scale  applications.  Key  management  for  maintaining  shared  keys  in  the 
presence  of  malicious  parties  and  players  that  join  and  leave  the  network  may  add  non-trivial 
overhead  and  complexity. 

All  previous  approaches  (except  our  Over-Threshold  Set-Union  protocols  of  Chapter  4)  of 
which  we  are  aware  propose  heuristic  solutions  [37] ;  some  even  require  additional  trust  assump¬ 
tions  such  as  “friends”  [30,  55].  They  do  not  preserve  several  forms  of  privacy  that  we  believe 
are  important  (see  Section  5.1),  do  not  give  rigorous  analysis,  and  cannot  prevent  malicious 
participants  from  changing  the  result  arbitrarily. 

In  this  thesis,  we  propose  new  techniques  for  efficient,  secure,  and  privacy-preserving  dis¬ 
tributed  hot  item  identification  and  publication.  To  avoid  counting  the  occurrences  of  each 
hot  item  separately,  we  utilize  a  probabilistic  filtering  technique,  allowing  both  efficiency  and 
privacy.  Each  player  constructs  a  local  filter,  which  is  then  combined  with  those  of  other  play¬ 
ers  to  create  a  global  filter.  In  the  process  of  combination,  we  utilize  an  approximate  counting 
technique  which  is  both  efficient  and  secure  against  undue  interference  by  malicious  parties. 

Protocols  for  hot  item  identification  and  publication  that  achieve  standard  cryptographic 
definitions  of  privacy  [28]  are  too  inefficient  for  many  applications,  including  our  protocols  of 
Chapter  4.  We  design  protocols  that  enable  these  demanding  applications  by  trading  a  certain 
degree  of  privacy  for  greater  efficiency;  as  a  result,  our  protocols  are  comparably  asymptot¬ 
ically  efficient  to  approaches  that  do  not  protect  the  privacy  of  participants,  as  described  in 
Section  5.4.4.  We  also  construct  one-show  tags  to  prevent  malicious  players  from  tampering 
with  the  identification  of  hot  items.  Our  protocols  scale  extremely  well  when  increasing  the 
number  of  players.  If  the  hot-item  threshold  is  proportional  to  the  number  of  players,  then 
the  bandwidth  used  per  node  is  essentially  constant  as  the  number  of  players  increases,  as  is 
optimal.  (See  Section  5.4.4.) 

Elements  of  honest  players’  private  input  sets  are  protected  by  data  privacy.  This  property, 
which  we  rigorously  define,  is  weaker  than  a  standard  notion  of  cryptographic  privacy  [28] ,  but 
can  be  achieved  more  efficiently.  Essentially,  the  data  privacy  property  states  that  non-hot 
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items  are  hidden  in  a  crowd  of  indistinguishable  elements,  and  that  the  more  rare  an  item  is, 
the  less  information  is  revealed  about  it.  Players  who  publish  their  hot  items  are  protected  by 
the  property  owner  privacy,  which  we  rigorously  define.  Players  may  choose  between  correlated 
owner  privacy,  in  which  published  elements  cannot  be  associated  with  the  publishing  player, 
and  uncorrelated  owner  privacy,  in  which  we  enforce  the  additional  guarantee  that  no  player 
can  distinguish  whether  two  items  have  appeared  in  the  same  private  input  set. 

Our  protocol  prevents  a  group  of  malicious  players  from  influencing  the  identification  and 
publication  of  hot  items:  no  group  of  malicious  players  may  cause  any  element  to  be  identified 
as  a  hot  item  with  higher  probability  than  if  it  simply  appeared  in  the  malicious  players’  private 
input  sets. 

Unlike  previous  work,  our  protocols  are  extremely  flexible  in  situations  in  which  untrusted 
clients  often  join  and  leave  the  network.  As  we  require  no  threshold  cryptography,  secure  multi¬ 
party  computation,  or  global  knowledge  of  the  network,  no  consistent  set  of  players  is  needed 
to  execute  a  protocol.  No  player  need  trust  any  other. 

We  prove  bounds  on  the  probability  of  correctness  of  our  protocols,  as  well  as  for  data-  and 
owner-privacy.  The  approach  we  introduce  in  this  thesis  is  applicable  to  many  situations.  Our 
protocols  are  the  most  efficient  hot  item  identification  and  publication  protocols  of  which  we 
are  aware  that  achieve  the  properties  of  data  and  owner  privacy. 


1.3  Thesis  Outline 

We  begin  by  discussing  related  work  in  Chapter  2  and  cryptographic  and  mathematical  prelim¬ 
inaries  in  Chapter  3.  We  then  introduce  our  techniques  and  protocols  for  privacy-preserving  set 
and  multiset  computation  in  Chapter  4.  In  Chapter  5,  we  introduce  our  protocols  for  privacy¬ 
preserving  distributed  hot  item  identification  and  publication.  We  conclude  in  Chapter  6.  We 
include  additional  proofs  and  information  about  our  results  in  Chapter  5  in  Appendix  B. 
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Chapter  2 


Related  Work 


In  this  chapter,  we  consider  previous  work  related  to  our  privacy-preserving  multiset  operations 
and  hot-item  identification.  We  then  discuss  the  distinctions  between  our  results  and  previous 
work. 


2.1  Works  Related  to  Multiset  Operations 

Most  of  the  privacy-preserving  multiset  operations  and  functions  we  address  in  this  thesis 
(Chapter  4)  have  no  better  result  in  previous  work  than  through  general  multiparty  computa¬ 
tion.  General  two-party  computation  was  introduced  by  Yao  [56] ,  and  general  computation  for 
multiple  parties  was  introduced  in  [5].  In  general  multiparty  computation,  the  players  share 
the  values  of  each  input,  and  cooperatively  evaluate  the  circuit.  For  each  multiplication  gate, 
the  players  must  cooperate  to  securely  multiply  their  inputs  and  re-share  the  result,  requiring 
0(n)  communication  for  honest-but-curious  players  and  O(n^)  communication  for  malicious 
players  [28].  Recent  results  that  allow  non-interactive  private  multiplication  of  shares  [16]  do 
not  extend  to  our  adversary  model  (see  Section  3.1),  in  which  any  c  <  n  players  may  collude. 
Our  results  are  more  efficient  than  the  general  MFC  approach;  we  compare  communication 
complexity  in  Table  4. 

One  privacy-preserving  function  that  has  been  considered  in  both  our  results  and  previ¬ 
ous  work  is  set  intersection.  Rakesh  Agrawal  and  Alexandre  Evfimievski  and  Ramakrishnan 
Srikant  [1]  and  Freedman,  Nissim,  and  Pinkas  (FNP)  [23]  proposed  protocols  for  problems  re¬ 
lated  to  two  party  Set-Intersection.  FNP  proposed  protocols  for  multiparty  set  intersection 
(secure  only  against  honest-but-curious  players)  and  two-party  cardinality  set  intersection  as 
well.  FNP’s  results  are  based  on  the  representation  of  sets  as  roots  of  a  polynomial  [23].  Their 
work  does  not  utilize  properties  of  polynomials  beyond  evaluation  at  given  points.  In  Chapter  4 
of  this  thesis,  we  explore  the  power  of  polynomial  representation  of  multisets,  using  operations 
on  polynomials  to  obtain  three  composable  privacy-preserving  multiset  operations.  We  give  a 
more  detailed  comparison  of  our  Set-Intersection  protocol  with  FNP  in  Table  4. 

In  addition  to  previous  work  on  privacy-preserving  set  intersection,  researchers  have  designed 
protocols  for  privacy-preserving  computation  of  several  related  functions.  For  example,  private 
equality  testing  is  the  problem  of  set-intersection  for  the  case  in  which  the  size  of  the  private 
input  sets  is  1.  Protocols  for  this  problem  are  proposed  in  [19,  38,  43],  and  fairness  is  added 
in  [8].  Another  related  problem  is  in  testing  the  disjointness  of  private  input  sets  [34];  a 
restricted  version  of  the  Cardinality  Set-Intersection  problem.  We  do  not  enumerate  the  works 
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of  privacy-preserving  computation  of  other  functions  here,  as  they  address  drastically  different 
problems  and  cannot  be  applied  to  our  setting. 


2.2  Works  Related  to  Hot-Item  Identification 

In  essence,  hot-item  identification  is  a  small  variation  on  the  problem  of  Over-Threshold  Set- 
Union.  We  address  the  Over-Threshold  Set-Union  problem  in  Chapter  4,  with  previous  work 
in  a  cryptographically  secure  setting  considered  in  Section  2.1.  In  hot- item  identification,  the 
players  in  the  protocol  approximate  the  desired  results,  and  give  up  a  certain  measure  of  privacy 
in  exchange  for  increased  efficiency  and  robustness.  For  example,  our  Over-Threshold  Set-Union 
protocol  requires  the  players  to  share  a  decryption  key  and  perform  joint  decryption.  We  do 
not  believe  that  such  an  assumption  is  tenable  in  all  situations. 

Several  applications  of  privacy-preserving  hot  item  identification  and  publishing  have  been 
considered  in  previous  work.  Certain  privacy-breaching  attacks  against  distributed  network 
monitoring  nodes  were  described  in  [37].  They  did  not,  however,  give  a  concrete  definition  of 
security  for  their  attempts  to  defeat  such  attacks,  and  their  techniques  require  trusted  central 
servers.  Additionally,  in  many  cases,  significant  breaches  in  privacy  occur  when  outlier  ele¬ 
ments,  that  appear  in  very  few  other  players’  servers,  are  revealed;  they  do  not  assuage  such 
concerns.  Privacy-preserving  collection  of  statistics  about  computer  configurations  has  also 
been  considered  in  previous  work  [30,  55].  Like  the  work  in  [37],  they  do  not  give  a  concrete 
definition  of  security,  but  instead  a  technique  for  heuristically  confusing  attackers.  Their  ap¬ 
proach  also  relies  on  chains  of  trust  between  friends,  unlike  our  approach,  in  which  nodes  may 
be  arbitrarily  malicious.  It  is  nearly  impossible  to  evaluate  the  claims  of  privacy  of  these  works, 
without  a  formal  definition  of  security.  We  also  believe  some  of  the  assumptions  made  in  these 
works  are  untenable  in  many  scenarios. 

In  a  non-distributed  context,  [4,  27,  32]  examine  the  identification  of  elements  that  appear 
often  in  a  data  stream,  through  the  use  of  approximate  counting.  We  generalize  this  task  to  a 
distributed  setting,  as  well  as  enforcing  important  privacy  properties. 


Chapter  3 


Preliminaries 


In  this  thesis,  we  utilize  several  cryptographic  and  mathematical  tools  described  in  previous 
work.  We  briefly  describe  these  tools  in  this  chapter,  including  references  to  fuller  descriptions, 
as  well  as  the  standard  adversary  models  utilized  in  this  thesis. 


3.1  Adversary  Models 

In  this  section  we  describe  the  adversary  models  used  in  the  work  throughout  this  thesis.  We 
provide  intuition  and  informal  definitions  of  these  models;  formal  definitions  can  be  found 
in  [28]. 

3.1.1  Honest-But-Curious  Adversaries 

Honest-but-curious  adversaries  act  according  to  their  prescribed  actions  in  the  protocol.  Se¬ 
curity  against  such  adversaries  is  straightforward:  no  player  or  coalition  of  c  <  n  honest-but- 
curious  players  (who  may  cheat  by  sharing  their  private  information)  gains  information  about 
other  players’  private  input  sets,  other  than  what  can  be  deduced  from  the  result  of  the  protocol. 
This  is  formalized  by  considering  an  ideal  implementation  where  a  trusted  third  party  (TTP) 
receives  the  inputs  of  the  parties  and  outputs  the  result  of  the  defined  function.  We  require 
that  in  the  real  implementation  of  the  protocol — that  is,  one  without  a  TTP — each  party  does 
not  learn  more  information  than  in  the  ideal  implementation,  with  overwhelming  probability. 


3.1.2  Malicious  Adversaries 

Malicious  adversaries  may  behave  arbitrarily,  in  contrast  to  honest-but-curious  adversaries  who 
follow  the  specified  protocol.  In  particular,  we  cannot  hope  to  prevent  malicious  parties  from 
refusing  to  participate  in  the  protocol,  choosing  arbitrary  values  for  their  private  data  inputs,  or 
aborting  the  protocol  prematurely.  Instead,  we  focus  on  the  standard  security  definition  (see, 
e.g.,  [28])  which  captures  the  correctness  and  the  privacy  issues  of  the  protocol.  Informally, 
the  security  definition  is  based  on  a  comparison  between  the  ideal  model  and  a  TTP,  where  a 
malicious  party  may  give  arbitrary  input  to  the  TTP.  The  security  definition  is  also  limited  to 
the  case  where  at  least  one  of  the  parties  is  honest.  Let  T  be  the  set  of  colluding  malicious  parties; 
for  any  strategy  T  can  follow  in  the  real  protocol,  there  is  a  translated  strategy  that  it  could 
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The  simulator  communicates  with  the  malicious  parties  r  according  to  the 
protocol.  Using  his  special  abilities  as  a  simulator,  he  obtains  their  private 
inputs,  submits  this  data  to  the  trusted  third  party,  and  communicates  the 
result  returned  by  the  trusted  third  party  to  the  malicious  parties. 


Figure  3.1:  Basic  outline  of  a  standard  simulation  proof. 


follow  in  the  ideal  model,  such  that,  to  F,  the  real  execution  is  computationally  indistinguishable 
from  execution  in  the  ideal  model. 

A  simulation  proof  is  a  common  method  of  proving  security  under  such  a  definition:  the 
simulator  G  provides  a  concrete  method  of  translating  any  strategy  executed  by  F  to  a  strategy 
in  the  TTP  model.  We  illustrate  such  a  proof  in  Figure  4.9. 

In  this  thesis  we  consider  only  the  class  of  PPT  adversaries,  whether  they  are  malicious  or 
honest-but-curious . 

3.2  Multiset  Operations  Preliminaries 

In  this  section,  we  describe  several  cryptographic  tools  that  we  utilize  in  our  constructions  of 
Chapter  4. 

3.2.1  Additively  Homomorphic  Cryptosystem 

In  Chapter  4,  we  utilize  a  semantically  secure  [29],  additively  homomorphic  public- key  cryp¬ 
tosystem  whose  plaintext  domain  can  be  chosen  to  be  a  ring  R  of  arbitrarily  large  size.  Let 
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Epk{-)  denote  the  encryption  function  with  public  key  pk.  The  cryptosystem  supports  the  fol¬ 
lowing  operations,  which  can  be  performed  without  knowledge  of  the  private  key:  (1)  Given  the 
encryptions  of  a  and  b,  Epk{a)  and  Epk{b),  we  can  efficiently  compute  the  encryption  of  a  -|-  6, 
denoted  Epk{a  +  b)  :=  Epk{a)  +h  Epk{b)-,  (2)  Given  a  constant  c  and  the  encryption  of  a,  Epk{a), 
we  can  efficiently  compute  the  encryption  of  ca,  denoted  Epk{c-  a)  :=  c  Xh  Epk{a).  When  such 
operations  are  performed,  we  require  that  the  resulting  ciphertexts  be  re-randomized  for  secu¬ 
rity.  In  re-randomization,  a  ciphertext  is  transformed  so  as  to  form  an  encryption  of  the  same 
plaintext,  under  a  different  random  string  than  the  one  originally  used.  We  also  require  that 
the  homomorphic  public-key  cryptosystem  support  secure  (n,  n)-threshold  decryption,  i.e.,  the 
corresponding  private  key  is  shared  by  a  group  of  n  players,  and  decryption  must  be  performed 
by  all  players  acting  together.  We  also  require  that  no  PPT  adversary  can  recover  the  sizes  of 
the  subfields  of  R  with  greater  than  negligible  probability. 

When  we  utilize  an  additively  homomorphic  cryptosystem  in  protocols  secure  against  mali¬ 
cious  players,  we  require  that:  (1)  the  decryption  protocol  be  secure  against  malicious  players  - 
typically,  this  is  done  by  requiring  each  player  to  prove  in  zero-knowledge  that  he  has  followed 
the  threshold  decryption  protocol  correctly  [26];  (2)  efficient  construction  of  zero-knowledge 
proofs  of  plaintext  knowledge;  (3)  optionally,  efficient  construction  of  certain  zero-knowledge 
proofs  concerning  the  use  of  the  cryptosystem’s  homomorphic  properties,  as  detailed  in  Sec¬ 
tion  4.4.1. 

Note  that  Paillier’s  cryptosystem  [45]  satisfies  each  of  our  requirements:  it  is  additively 
homomorphic,  supports  ciphertext  re-randomization  and  threshold  decryption  (secure  in  the 
malicious  case)  [21,  22],  allows  efficient  zero-knowledge  proofs  for  the  cases  that  we  require 
(these  are  standard  constructions  from  [9,  14]  and  proof  of  plaintext  knowledge  [15]),  and 
recovering  the  sizes  of  the  subfields  of  the  plaintext  domain  R  is  equivalent  to  breaking  the 
semantic  security  of  the  cryptosystem.  Key  generation  can  be  performed  in  a  distributed 
fashion  for  these  distributed  Paillier  schemes  [21,  22]. 

In  Ghapter  4,  we  simply  use  Epk{-)  to  denote  the  encryption  function  of  a  homomorphic 
cryptosystem  which  satisfies  all  the  aforementioned  properties. 


3.2.2  Shuffle  Protocol 

Let  each  player  i  (1  <  i  <  n)  in  the  Shuffle  protocol  have  a  private  input  multiset  V).  We  define 
the  Shuffle  problem  as  follows:  all  players  learn  the  joint  multiset  Vi  U  •  •  •  U  Vn,  such  that  no 
player  or  coalition  of  c  <  n  players  T  can  gain  a  non-negligible  advantage  in  distinguishing,  for 
each  element  a  G  Ki  U  •  •  •  U  14,  an  honest  player  i  (1  <  i  <  n,  i  0  T)  such  that  a  G  Vi.  That  is, 
the  origin  of  each  element  (contributed  by  an  honest  player)  in  the  joint  multiset  Ki  U  •  •  •  U  14 
is  anonymous  to  any  player  or  coalition  of  c  <  n  players.  A  Shuffle  protocol  may  be  secure 
against  honest- but-curious  or  malicious  players;  we  specify  this  security  requirement  in  context 
of  the  protocol’s  use. 

In  several  protocols  in  Ghapter  4,  we  will  impose  an  additional  privacy  condition  on  the  Shuf¬ 
fle  problem;  the  multisets  Ki, . . . ,  14  are  composed  of  ciphertexts,  which  must  be  re-randomized 
so  that  no  player  may  determine  which  ciphertexts  were  part  of  his  private  input  multiset.  The 
revised  problem  statement  is  as  follows:  all  players  learn  the  joint  multiset  14  U  •  •  •  U  14,  such 
that  no  player  or  coalition  of  players  can  gain  a  non-negligible  advantage  in  distinguishing,  for 
each  element  a  G  Ki  U  •  •  •  U  14,  a  player  i  (1  <  i  <  n)  such  that  a  G  K.  That  is,  the  origin  of 
each  element  (contributed  by  any  player)  in  the  joint  multiset  14  U  •  •  •  U  14  is  anonymous  to 
any  player  or  coalition  of  c  <  n  players. 
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Both  variants  of  the  Shuffle  protocol  can  be  easily  accomplished  with  standard  techniques  [13, 
17,  24,  31,  44],  with  communication  complexity  at  most  0{n?k). 
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Chapter  4 


Privacy-Preserving  Set  and  Multiset 
Operations 

Sets  and  multisets  are  common  data  formats;  many  database  operations  may  be  represented 
as  operations  on  sets  and  multisets.  We  thus  examine  in  this  chapter  an  important  application 
of  privacy-preserving  distributed  information  sharing:  composable  privacy-preserving  set  and 
multiset  operations,  and  secure  protocols  based  on  these  operations. 

We  begin  by  describing  our  composable  operations  and  their  mathematical  foundations  in 
Section  4.1  before  proceeding  to  construct  protocols  for  several  important  applications.  These 
protocols  include  several  secure  against  honest-but-curious  adversaries:  Set-Intersection  and 
Cardinality  Set-Intersection  (Section  4.2),  as  well  as  Over-Threshold  Set-Union  and  several  vari¬ 
ants  on  Threshold  Set-Union  (Section  4.3).  We  then  construct  protocols  for  Set-Intersection, 
Cardinality  Set-Intersection,  and  Over-Threshold  Set-Union  secure  against  malicious  players  in 
Section  4.4.  To  show  that  our  techniques  extend  even  beyond  privacy-preserving  set  and  multi¬ 
set  operations,  we  briefly  describe  protocols  for  several  additional  applications  in  Section  4.5.  In 
Table  4,  we  show  the  communication  complexity  of  several  of  our  protocols,  and  compare  their 
efficiencies  to  that  of  previous  work  (see  Section  2.1  for  a  more  detailed  discussion  of  previous 
work) . 


Our  solution 

Previous  solution 

General  MPC 

Set-Intersection  (HBC) 

0(cnfclg|P|) 

0(n^fc lg|P|)  [23] 

0(n^fc  polylog(fc)lg|P|) 

Set-Intersection  (Malicious) 

0(n^fclg|P|) 

none 

0{n'^k  polylog(fc)  Ig  |P|) 

Cardinality  Set-Intersection  (HBC) 

0(n^fclg|P|) 

none 

0(n^fc  polylog(fc)  Ig  P  ) 

Over-Threshold  Set-Union  (HBC) 

0(n^fc  lg|P|) 

none 

0(n^/c  po\ylog(nk)  Ig  \P\) 

Threshold  Set-Union  (HBC) 

0(n^fclg|P|) 

none 

0{n^k  po\ylog(nk)  Ig  |P|) 

Subset  (HBC) 

0(fclg|P|) 

none 

0(k  polylog(fc)lg|P|) 

Figure  4.1:  Total  communication  complexity  comparison  for  our  multiparty  protocols,  previous 
solutions,  and  general  multiparty  computation.  There  are  n  >  2  players,  c  <  n  dishonestly 
colluding,  each  with  an  input  multiset  of  size  k.  The  domain  of  the  multiset  elements  is  P. 
Security  parameters  are  not  included  in  the  communication  complexity. 
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4.1  Techniques  and  Mathematical  Intuition 


In  this  section,  we  introduce  our  techniques  for  privacy-preserving  computation  of  operations 
on  sets  and  multisets. 


Problem  Setting.  Let  there  be  n  players.  We  denote  the  private  input  set  of  player  i  as  Sj, 
and  \Si\  =  k  {1  <  i  <  n).  We  denote  the  jth  element  of  set  i  as  {Si)j.  We  denote  the  domain 
of  the  elements  in  these  sets  as  P,  {Si)j  G  P). 

Let  R  denote  the  plaintext  domain  Dom(Lipfc(-))  (in  Paillier’s  cryptosystem,  R  is  Zjy).  We 
require  that  R  be  sufficiently  large  that  an  element  a  drawn  uniformly  from  R  has  only  negligible 
probability  of  representing  an  element  of  P,  denoted  a  G  P.  For  example,  we  could  require 
that  only  elements  of  the  form  b  =  a  \\  h{a)  could  represent  an  element  in  P,  where  h{-) 
denotes  a  cryptographic  hash  function  [40].  That  is,  there  exists  an  a  of  proper  length  such 
that  6  =  a  II  h{a).  If  |/i(-)|  =  Ig  (^),  then  there  is  only  e  probability  that  a'  ^  R  represents  an 
element  in  P. 

In  this  section,  we  first  give  background  on  polynomial  representation  of  multisets,  as  well  as 
the  mathematical  properties  of  polynomials  that  we  use  in  this  chapter.  We  then  introduce  our 
privacy-preserving  (in  a  TTP  setting)  multiset  operations  using  polynomial  representations, 
then  show  how  to  achieve  privacy  in  the  real  setting  by  computing  them  using  encrypted 
polynomials.  Finally,  we  overview  the  applications  of  these  techniques  explored  in  the  rest  of 
the  chapter. 


4.1.1  Background:  Polynomial  Rings  and  Polynomial  Representation  of  Sets 

The  polynomial  ring  R[x]  consists  of  all  polynomials  with  coefficients  from  R.  Let  f,gG  R[x], 
such  that  f{x)  =  where  f[i]  denotes  the  coefficient  of  x*  in  the  polynomial  /.  Let 

f  +  g  denote  the  addition  of  /  and  g,  f*g  denote  the  multiplication  of  /  and  g,  and  denote 
the  dth  formal  derivative  of  /.  Note  that  the  formal  derivative  of  /  is  ^ {i-\-l)f[i-\-\]x^ . 


Polynomial  Representation  of  Sets.  In  this  chapter,  we  use  polynomials  to  represent 
multisets.  Given  a  multiset  S  =  {«S'j}i<j<fc,  we  construct  a  polynomial  representation  of  5, 
/  G  R[x],  as  f{x)  =  ni<j<fc(2^  “  Sj).  On  the  other  hand,  given  a  polynomial  /  G  R[x], 
we  define  the  multiset  S  represented  by  the  polynomial  /  as  follows:  an  element  a  G  5  if 
and  only  if  (1)  /(a)  =  0  and  (2)  a  represents  an  element  from  P.  Note  that  our  polynomial 
representation  naturally  handles  multisets:  The  element  a  appears  in  the  multiset  b  times  if 
(x  -  a)'*  I  /  A  (x  -  a)'’+^  /  /. 

Note  that  previous  work  utilized  polynomials  to  represent  sets  [23]  (as  opposed  to  multisets). 
However,  to  the  best  of  our  knowledge,  no  operations  beyond  polynomial  evaluation  have  been 
employed  to  manipulate  said  polynomials.  As  a  result,  previous  work  is  limited  to  set  intersec¬ 
tion  and  cannot  be  composed  with  other  set  operators.  In  this  chapter,  we  propose  a  framework 
to  perform  various  set  and  multiset  operations  using  polynomial  representations  and  construct 
efficient  privacy-preserving  set  operations  using  the  mathematical  properties  of  polynomials. 
By  utilizing  polynomial  representations  to  represent  sets  and  multisets,  our  framework  allows 
arbitrary  composition  of  multiset  operators  as  outlined  in  our  grammar. 
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4.1.2  Our  Techniques:  Privacy-Preserving  Multiset  Operations 

In  this  section,  we  construct  algorithms  for  computing  the  polynomial  representation  of  opera¬ 
tions  on  sets,  including  union,  intersection,  and  element  reduction.  We  design  these  algorithms 
to  be  privacy-preserving  in  the  following  sense:  the  polynomial  representation  of  any  operation 
result  reveals  no  more  information  than  the  set  representation  of  the  result.  First,  we  introduce 
our  algorithms  for  computing  the  polynomial  representation  of  set  operations  union,  intersec¬ 
tion,  and  element  reduction  (with  a  trusted  third  party).  We  then  extend  these  techniques 
to  encrypted  polynomials,  allowing  secure  implementation  of  our  techniques  without  a  trusted 
third  party.  Note  that  the  privacy-preserving  multiset  operations  defined  in  this  section  may 
be  arbitrarily  composed  (see  Section  4.5.1),  and  constitute  truly  general  techniques. 


Set  Operations  Using  Polynomial  Representations 

In  this  section,  we  introduce  efficient  techniques  for  multiset  operations  using  polynomial  rep¬ 
resentations.  In  particular,  let  /,  g  be  polynomial  representations  of  the  multisets  S  and  T, 
respectively.  We  describe  techniques  to  compute  the  polynomial  representation  of  their  union, 
intersection,  and  element  reduction.  We  design  our  techniques  so  that  the  polynomial  repre¬ 
sentation  of  any  operation  result  reveals  no  more  information  than  the  multiset  representation 
of  the  result.  We  formally  state  a  strong  privacy  property  for  each  operation  in  Theorems  1,  3, 
and  5. 


Union.  We  define  the  union  of  multisets  S'  U  T  as  the  multiset  where  each  element  a  that 
appears  in  S  bs  >  0  times  and  T  br  >0  times  appears  in  the  resulting  multiset  bs  +  bx  times. 
We  compute  the  polynomial  representation  of  SUT  as  follows,  where  /  and  g  are  the  polynomial 
representation  of  S  and  T  respectively: 

f  *9- 

Note  that  f  *  g  is  a,  polynomial  representation  of  S  U  T  because  (1)  all  elements  that  appear  in 
either  set  S  or  T  are  preserved:  (/(a)  =  0)  A  {g{b)  =  0)  ^  ((/  *  g){a)  =  0)  A  ((/  *  g){b)  =  0); 
(2)  as  /(a)  =  0  4A  (x  —  a)  I  /,  duplicate  elements  from  each  multiset  are  preserved:  (/(a)  = 
0)  A  {g{a)  =  0)  ^  (x  —  a)^  |  {f  *  g)-  In  addition,  we  prove  that,  given  f  *  g,  one  cannot  learn 
more  information  about  S  and  T  than  what  can  be  deduced  from  S  U  T,  as  formally  stated  in 
the  following  theorem: 

Theorem  1.  Let  TTPl  be  a  trusted  third  party  which  receives  the  private  input  multiset  Si 
from  player  i  for  1  <  i  <  n,  and  then  returns  to  every  player  the  union  multiset  Si  U  •  •  •  U  S^ 
directly.  Let  TTP2  be  another  trusted  third  party,  which  receives  the  private  input  multiset  Si 
from  player  i  for  1  <i  <n,  and  then:  (1)  calculates  the  polynomial  representation  fi  for  each 
Si;  (2)  computes  and  returns  to  every  player  OILi  fi- 

There  exists  a  PPT  translation  algorithm  such  that,  to  each  player,  the  results  of  the  following 
two  scenarios  are  distributed  identically:  (1)  applying  translation  to  the  output  o/TTPl;  (2) 
returning  the  output  o/TTP2  directly. 

Proof.  Theorem  1  is  trivially  true.  (This  theorem  is  included  for  completeness.)  □ 


Intersection.  We  define  the  intersection  of  multisets  SnT  as  the  multiset  where  each  element 
a  that  appears  in  S  65  >  0  times  and  T  bx  >  0  times  appears  in  the  resulting  multiset 
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minlfes)  &t}  times.  Let  S  and  T  be  two  multisets  of  equal  size,  and  /  and  g  be  their  polynomial 
representations  (also  of  equal  size)  respectively.  We  compute  the  polynomial  representation  of 
5  n  T  as: 

f*r+g*s 

where  r,s  ^  [x] ,  where  is  the  set  of  all  polynomials  of  degree  0, . . . ,  6  with  coeffi¬ 
cients  chosen  independently  and  uniformly  from  R:  r  =  ^  ~ 

where  '^0<i<deg(/)  '^0<i<deg(/)  '5[^]  R- 

We  show  below  that  f  *  r  +  g  *  s  is  a  polynomial  representation  of  5  n  T.  In  addition,  we 
prove  that,  given  f  *  r  +  g  *  s,  one  cannot  learn  more  information  about  S  and  T  than  what 
can  be  deduced  from  S'  n  T,  as  formally  stated  in  Theorem  3. 

First,  we  must  prove  the  following  lemma,  based  on  our  definition  of  gcd  as  the  output  of 
Euclid’s  gcd  algorithm  (see  Lemma  19  in  Section  4.6): 

Lemma  2.  Let  f,g  be  polynomials  in  R[x]  where  R  is  a  ring  sueh  that  no  PPT  adversary 
ean  find  the  size  of  its  subfields  with  non-negligible  probability,  deg(/)  =  deg{g)  =  a,  (5  >  a, 
gcd(/,c/)  =  1,  and  /[deg(/)]  G  R*  A  g [deg (5)]  G  R* .  Let  r  = 
where  Vo<j</3  r[i]  <—  R,  Vo<i</3  s[i]  <—  R  (independently) . 

Let  u  =  f*r  +  g*s  =  Then  Vo<i<Q+/3  u[i]  are  distributed  uniformly  and 

independently  over  R. 

We  prove  Lemma  2  in  Section  4.6. 

By  this  lemma,  f  *  r  +  g  *  s  =  gcd{f,g)  *  u,  where  u  is  distributed  uniformly  in  R'^[x]  for 
7  =  2deg(/)  —  |Sn  T|.  Note  that  a  is  a  root  of  gcd(/,  5^)  and  {x  —  a)^“  |  gcd{f,g)  if  and  only 
if  a  appears  ia  times  in  S  n  T.  Moreover,  because  u  is  distributed  uniformly  in  R^[x],  with 
overwhelming  probability  the  roots  of  u  do  not  represent  any  element  from  P  (as  explained  in 
the  beginning  of  Section  4.1).  Thus,  the  computed  polynomial  f*r  +  g*s  is  a,  polynomial 
representation  of  SnT.  Note  that  this  technique  for  computing  the  intersection  of  two  multisets 
can  be  extended  to  simultaneously  compute  the  intersection  of  an  arbitrary  number  of  multisets 
in  a  similar  manner.  Also,  given  f  *  r  +  g  *  s,  one  cannot  learn  more  information  about  S  and 
T  than  what  can  be  deduced  from  S'  n  T,  as  formally  stated  in  the  following  theorem: 

Theorem  3.  Let  TTPl  be  a  trusted  third  party  whieh  reeeives  the  private  input  multiset  Si  of 
size  k  from  player  i  for  1  <  i  <  n,  and  then  returns  to  every  player  the  interseetion  multiset 
Si  n  •  •  •  n  Sn  direetly.  Let  TTP2  be  another  trusted  third  party,  whieh  reeeives  the  private  input 
multiset  Si  from  player  i  for  1  <i  <n,  and  then:  (1)  ealeulates  the  polynomial  representation 
fi  for  eaeh  Si;  (2)  ehooses  ri  <—  R^[x];  (3)  eomputes  and  returns  to  eaeh  player  fi  * 

There  exists  a  PPT  translation  algorithm  sueh  that,  to  eaeh  player,  the  results  of  the  following 
two  seenarios  are  distributed  identieally:  (1)  applying  translation  to  the  output  o/TTPl;  (2) 
returning  the  output  o/TTP2  direetly. 

Proof  sketeh.  Let  the  output  of  TTPl  be  denoted  T.  The  translation  algorithm  operates  as 
follows:  (1)  calculates  the  polynomial  representation  g  oiT]  (2)  chooses  the  random  polynomial 
u  <—  (3)  computes  and  returns  g  *  u.  □ 


Element  Reduction.  We  define  the  operation  of  element  reduction  (by  d)  of  a  multiset 
S  (denoted  Rdrf(S))  as  follows:  for  each  element  a  that  appears  b  times  in  S,  it  appears 
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max{6  —  d,  0}  times  in  the  resulting  multiset.  We  compute  the  polynomial  representation  of 
Rdrf(5)  as: 

d 

i=o 

where  Vj  <—  (0  <  j  <  d)  and  each  Fj  is  any  polynomial  of  degree  j,  such  that 

Vaep  F{a)  /  0  (0  <  j  <  d)  and  gcd(Fo, . . .  ,Fd)  =  1.  Note  that  random  polynomials  of  degree 
0, . . . ,  d  in  R[x\  have  these  properties  with  overwhelming  probability. 

To  show  that  formal  derivative  operation  allows  element  reduction,  we  require  the  following 
lemma: 

Lemma  4.  Let  Fj  G  i?[x]  (0  <  j  <  d)  each  of  degree  j  such  that  gcd(Fo, . . . ,  Fa)  =  1.  For 
all  elements  a  G  R  such  that  Vo<j<d  {x  —  a)  f  Fj,  q  G  R[X]  such  that  {x  —  a)  f  q,  and 
rj  ^  (0<j<d),  and: 

•  if  m  >  d,  f  =  {x  —  a)™'  *  q  ^  {x  —  |  X]j=o  ^  f 

•  if  m  <  d,  f  =  {x  -  a)'^  *  q  ^  {x  -  a)  \  Yfj=o  f^^^  *  Fj  *  rj 
with  overwhelming  probability. 

We  prove  this  lemma  in  Section  4.6.  By  Lemma  2,  Y2j=o  ~ 

gcd(/('^),/('^-^),...,/)  *  u,  where  u  is  distributed  uniformly  in  R'^[x]  for  7  =  2/c  —  |Rdrf(5)|. 
Thus,  with  overwhelming  probability,  any  root  of  u  does  not  represent  any  element  from  P. 
Therefore,  Yl'j=o  *  rj  is  a  polynomial  representation  of  Rd(i(S'),  and  moreover,  given 

rj,  one  cannot  learn  more  information  about  S  than  what  can  be  deduced  from 
Rd(i(5),  as  formally  stated  in  the  following  theorem: 

Theorem  5.  Let  Fj  (0  <  j  <  d)  be  publicly  known  polynomials  of  degree  j  such  that 
Fj{o)  /  0  and  gcd(Fo, . . . ,  =  1.  Let  TTPl  be  a  trusted  third  party  which  receives  a 

private  input  multiset  S  of  size  k,  and  then  returns  the  reduction  multiset  Rdd{S)  directly.  Let 
TTP2  be  another  trusted  third  party,  which  receives  a  private  input  multiset  S,  and  then:  (1) 
calculates  the  polynomial  representation  f  of  S;  (2)  chooses  ro,...,rd  <—  R^[x];  (3)  computes 
and  returns  J2j=o 

There  exists  a  PPT  translation  algorithm  such  that  the  results  of  the  following  two  scenarios 
are  distributed  identically:  (1)  applying  translation  to  the  output  o/TTPl;  (2)  returning  the 
output  0/TTP2  directly. 

Proof  sketch.  Let  the  output  of  TTPl  be  denoted  T.  The  translation  algorithm  operates  as 
follows:  (1)  calculates  the  polynomial  representation  g  oiT]  (2)  chooses  the  random  polynomial 
u  <—  (3)  computes  and  returns  g  *  u.  □ 

Operations  with  Encrypted  Polynomials 

In  the  previous  section,  we  prove  the  security  of  our  polynomial-based  multiset  operators  when 
the  polynomial  representation  of  the  result  is  computed  by  a  trusted  third  party  (TTP2). 
By  using  additively  homomorphic  encryption,  we  allow  these  results  to  be  implemented  as 
protocols  in  the  real  world  without  a  trusted  third  party  (i.e.,  the  polynomial  representation  of 
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the  set  operations  is  computed  by  the  parties  collectively  without  a  trusted  third  party) .  In  the 
algorithms  given  above,  there  are  three  basic  polynomial  operations  that  are  used:  addition, 
multiplication,  and  the  formal  derivative.  We  give  algorithms  in  this  section  for  computation 
of  these  operations  with  encrypted  polynomials. 

For  /  G  R[x],  we  represent  the  encryption  of  polynomial  /,  Epk{f),  as  the  ordered 
list  of  the  encryptions  of  its  coefficients  under  the  additively  homomorphic  cryptosystem: 
Rpk{fM),...,Epk{f[deg{f)]).  Let  /i,  /2,  and  g  be  polynomials  in  R[x]  such  that  /i(x)  = 
f2{x)  =  f2[i]x\  and  g{x)  =  ■  Let  a,b  e  R.  Using  the 

homomorphic  properties  of  the  homomorphic  cryptosystem,  we  can  efficiently  perform  the  fol¬ 
lowing  operations  on  encrypted  polynomials  without  knowledge  of  the  private  key: 


•  Sum  of  encrypted  polynomials:  given  the  encryptions  of  the  polynomial  fi  and  /2,  we 
can  efficiently  compute  the  encryption  of  the  polynomial  g  :=  /i  -|-  /2,  by  calculating 
Epk{g[i])  ■=  Epk{fi[i])  +h  Epk{f2[i])  (0  <  i  <  max{deg(/i), deg(/2)}) 

•  Product  of  an  unencrypted  polynomial  and  an  encrypted  polynomial:  given  a  polynomial 
/2  and  the  encryption  of  polynomial  /i,  we  can  efficiently  compute  the  encryption  of 
polynomial  g  ■=  fi  *  /2,  (also  denoted  /2  *h  Epk{fi))  by  calculating  the  encryption  of 
each  coefficient 

Epk{g[i])  ■=  (/2[0]  Xh  Epk{fi[i]))  +h  (/2[1]  Xh  Epk{fi[i  - 

1]))  +h  ■■■  +h  (/2H  Xh  Epk{fi[0]))  (0  <  i  <  deg(/i)  -h  deg(/2)). 

•  Derivative  of  an  encrypted  polynomial:  given  the  encryption  of  polynomial  /i,  we  can 
efficiently  compute  the  encryption  of  polynomial  g  :=  ^/i,  by  calculating  the  encryption 
of  each  coefficient  Epk{g[i])  :=  {i  +  1)  Xh  Epk{fi[i  -hi])  (0  <  i  <  deg(/i)  -  1). 

•  Evaluation  of  an  encrypted  polynomial  at  an  unencrypted  point:  given  the  encryption  of 

polynomial  /i,  we  can  efficiently  compute  the  encryption  of  a  :=  /i(6),  by  calculating 
Epkia)  :=  (6°  X;,  Epk{h[0]))  +h  {b^  x^  EpkiMl]))  +h  ■  ■  ■  +h  Xn  Epk{h[deg{h)])) . 

Utilizing  the  above  operations  on  encrypted  polynomials,  we  can  securely  compute  results 
according  to  the  multiset  operations  described  in  Section  4.1.2  without  the  trusted  third  party 
(TTP2).  We  demonstrate  this  property  with  concrete  examples  detailed  in  the  remainder  of 
this  chapter. 


4.1.3  Overview  of  Applications 

The  techniques  we  introduce  for  privacy-preserving  computations  of  multiset  operations  have 
many  applications.  We  give  several  concrete  examples  that  utilize  our  techniques  for  specific 
privacy-preserving  functions  on  multisets  in  the  following  sections. 

First,  we  design  efficient  protocols  for  the  Set-Intersection  and  Cardinality  Set-Intersection 
problems,  secure  against  honest-but-curious  adversaries  (Section  4.2).  We  then  provide  an 
efficient  protocol  for  the  Over-Threshold  Set-Union  problem,  as  well  as  three  variants  of  the 
Threshold  Set-Union  problem,  secure  against  honest-but-curious  adversaries,  in  Section  4.3.  We 
introduce  tools  and  protocols,  secure  against  malicious  players,  for  the  Set-Intersection,  Cardi¬ 
nality  Set-Intersection,  and  Over-Threshold  Set-Union  problems  in  Section  4.4.  We  propose  an 
efficient  protocol  for  the  Subset  problem  in  Section  4.5.2. 
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Protocol:  Set-Intersection-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key 
of  a  homomorpic  cryptosystem. 

Output:  Each  player  determines  Si  n  •  •  •  D  S„. 

1.  Each  player  i  =  1, . . . ,  n 

(a)  calculates  the  polynomial  fi  =  (x  —  (Si)i) ...  (a;  —  {Si)k) 

(b)  sends  the  encryption  of  the  polynomial  fi  to  players  i  +  1, . . . ,  i  +  c 

(c)  chooses  c  +  1  polynomials  . . . ,  ri,c  <—  R'°[x] 

(d)  calculates  the  encryption  of  the  polynomial  (f>i  =  fi-c  *  +  •  •  •  +  fi-\  *  +  /i  *  np, 

utilizing  the  algorithms  given  in  Sec.  4.1.2. 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  4>i,  to  player  2 

3.  Each  player  i  =  2, . . . ,  n  in  turn 

(a)  receives  the  encryption  of  the  polynomial  Ai-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  +  4>i  by  utilizing  the  algorithms  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  X„  =  fi  *  other 

players. 

5.  All  players  perform  a  group  decryption  to  obtain  the  polynomial  p. 

Each  player  %  =  1, ...  ,n  determines  the  intersection  multiset  as  follows:  for  each  a  £  Si,  he  calculates  b  such 
that  {x  —  a)^\p  A  (x  —  /p.  The  element  a  appears  b  times  in  the  intersection  multiset. 

Figure  4.2:  Set-Intersection  protocol  secure  against  honest-but-curious  adversaries. 


More  generally,  our  techniques  allow  private  computation  of  functions  based  on  composition 
of  the  union,  intersection,  and  element  reduction  operators.  We  discuss  techniques  for  this 
general  private  computation  on  multisets  in  Section  4.5.1. 

Our  techniques  are  widely  applicable,  even  outside  the  realm  of  computation  of  functions 
over  multisets.  As  an  example,  we  show  how  to  apply  our  techniques  to  private  evaluation  of 
boolean  formulae  in  CNF  form  in  Section  4.5.3. 


4.2  Application  I:  Private  Set-Intersection  and  Cardinality  Set- 
Intersection 

In  this  section,  we  design  protocols  for  Set-Intersection  and  Cardinality  Set-Intersection  secure 
against  a  coalition  of  honest-but-curious  adversaries. 

4.2.1  Set-Intersection 

Problem  Definition.  Let  there  be  n  parties;  each  has  a  private  input  set  Si  {1  <  i  <  n)  of 
size  k.  We  define  the  Set-Intersection  problem  as  follows:  all  players  learn  the  intersection  of 
all  private  input  multisets  without  gaining  any  other  information;  that  is,  each  player  learns 

S’!  n  52  n  •  •  •  n  5n. 

Our  protocol  secure  against  honest-but-curious  adversaries  is  given  in  Fig.  4.2.  In  this 
protocol,  each  player  i  {1  <  i  <  n)  first  calculates  a  polynomial  representation  /*  G  i?[x]  of 
his  input  multiset  Si.  He  then  encrypts  this  polynomial  fi,  and  sends  it  to  c  other  players 
i  -\- 1, . . .  ,i  -\-  c.  For  each  encrypted  polynomial  Epk{fi),  each  player  i  -|-  J  (0  <  j  <  c)  chooses  a 
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random  polynomial  ri^jj  G  R^[x].  Note  that  at  most  c  players  may  collude,  thus 

is  both  uniformly  distributed  and  known  to  no  player.  They  then  compute  the  encrypted 

polynomial  *h  Epk{fi)-  From  these  encrypted  polynomials,  the  players  compute 

the  encryption  of  p  =  fi  *  players  engage  in  group  decryption  to 

obtain  the  polynomial  p.  Thus,  by  Theorem  3,  the  players  have  privately  computed  p,  a 
polynomial  representing  the  intersection  of  their  private  input  multisets.  Finally,  to  reconstruct 
the  multiset  represented  by  polynomial  p,  the  player  i,  for  each  a  G  Si,  calculates  b  such  that 
{x  —  a)^\p  A  {x  —  J(p.  The  element  a  appears  b  times  in  the  intersection  multiset. 


Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appro¬ 
priate  answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player 
gains  information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal 
statement  of  these  properties  is  as  follows: 

Theorem  6.  In  the  Set-Intersection  protocol  of  Fig.  4.2,  every  player  learns  the  intersection 
of  all  players’  private  inputs,  5i  n  ^2  n  •  •  •  n  Sn,  with  overwhelming  probability. 

Proof.  Each  player  learns  the  decrypted  polynomial  p  =  XlILi  *  (Y2'j=o 
Vjg[„]  fi{a)  =  0,  then  p{a)  =  0.  As  no  elements  that  are  not  in  every  players’  private  in¬ 
put  can  be  in  the  set-intersection  of  all  private  inputs,  all  elements  in  the  set- intersection  can 
be  recovered  by  each  player.  Each  element  in  his  private  input  that  a  root  of  p  is  a  member  of 
the  intersection  set. 

We  now  show  that,  with  high  probability,  erroneous  elements  are  not  inserted  into  the  answer 
set.  Note  that,  by  the  reasoning  of  Lemma  19,  all  coefficients  of  /*  (1  <  i  <  n)  are  in  the  set 
R*  U  {0}.  Thus,  by  Lemma  2,  the  decrypted  polynomial  is  of  the  form  (n  ag/(a^  —  a))  *s,  where 
s  is  uniformly  distributed  over  [x].  This  random  polynomial  s  is  of  polynomial  size,  and 

thus  has  a  polynomial  number  of  roots.  Each  of  these  roots  is  a  representation  of  an  element  from 
P  with  only  negligible  probability.  Thus,  the  probability  that  an  erroneous  element  is  included 
in  the  answer  set  is  also  negligible,  and  all  players  learn  exactly  the  intersection  set.  □ 

Theorem  7.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  se¬ 
mantically  secure,  with  overwhelming  probability,  in  the  Set-Intersection  protocol  of  Fig.  4-2, 
any  coalition  of  fewer  than  n  PPT  honest-but-curious  players  learns  no  more  information  than 
would  be  gained  by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted  third  party. 

Proof.  We  assume  that  the  homomorphic  cryptosystem  (E,  D)  used  in  the  protocol  is  in  fact 
secure  as  we  required.  Thus,  as  the  inputs  of  the  other  players  are  all  encrypted  until  the 
decryption  is  performed,  nothing  can  be  learned  by  any  player  before  that  point.  Each  player 
j  then  learns  only  the  summed  polynomial  p  =  Yl'i=i  fi  *  (j2^j=o  • 

Note  that  to  every  coalition  of  c  players,  for  every  i,  Yl'j=o  ^i+id  completely  random,  as 
at  least  one  player  in  the  c  -|-  1  players  who  chose  that  random  polynomial  is  not  a  member  of 
the  coalition,  and  so  '^’i+jd  uniformly  distributed  and  unknown. 

Note  that,  by  the  reasoning  of  Lemma  19,  all  coefficients  of  /*  (1  <  i  <  n)  are  in  the  set 
R*  U  {0}.  Thus,  by  Lemma  2,  p  =  fi  *  =  (nae/(®  “  “))  *  where  I  is 

the  intersection  set  and  s  is  uniformly  distributed  over  the  polynomials  of  appropriate  degree. 
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Protocol:  Cardinality-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key 
of  a  homomorpic  cryptosystem. 

Output:  Each  player  determines  l^i  n  •  •  •  n  S'n|. 

1.  Each  player  i  =  1, . . . ,  n 

(a)  calculates  the  polynomial  fi  =  (x  —  {Si)i) ...  (a;  —  {Si)k) 

(b)  sends  the  encryption  of  the  polynomial  fi  to  players  i  +  1, . . . ,  i  +  c 

(c)  chooses  c  +  1  random  polynomials  Vi^, . . . ,  <—  R'^[x] 

(d)  calculates  the  encryption  of  the  polynomial  (f>i  =  fi-c  *  +  •  •  •  +  fi-\  *  +  /i  *  np, 

utilizing  the  algorithms  given  in  Sec.  4.1.2. 

2.  Player  1  sends  the  encrypted  polynomial  Ai  =  0i,  to  player  2 

3.  Each  player  i  =  2, . . . ,  n  in  turn 

(a)  receives  the  encryption  of  the  polynomial  Ai-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  +  4>i  by  utilizing  the  algorithms  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  X„  =  fi  *  f’i+jj'j  to  all  other 

players. 

5.  Each  player  i  =  1, . . . ,  n 

(a)  evaluates  the  encryption  of  the  polynomial  p  at  each  input  {Si)j,  obtaining  encrypted  elements 
Epk{cij)  where  dj  =  p{{Si)j),  using  the  algorithm  given  in  Sec.  4.1.2. 

(b)  for  each  j  =  1, ...  ,k  chooses  a  random  number  rij  ^  R  and  calculates  an  encrypted  element 

(^)l  ~  ^ij  Epki^Cij') 

6.  All  players  perform  the  Shuffle  protocol  on  their  private  input  sets  Vi,  obtaining  a  joint  set  V,  in 
which  all  ciphertexts  have  been  re-randomized. 

7.  All  players  1, . . .  n  decrypt  each  element  of  the  shuffled  set  V. 

If  nb  of  the  decrypted  elements  from  V  are  0,  then  the  size  of  the  set  intersection  is  b. 

Figure  4.3:  Cardinality  set-intersection  protocol  secure  against  honest-but-curious  adversaries. 


Thus  no  information  about  the  private  inputs  of  the  honest  players  can  be  recovered  from  p, 
other  than  that  given  by  revealing  the  intersection  set.  □ 


4.2.2  Cardinality  Set-Intersection 

Problem  Definition.  We  define  the  Cardinality  Set-Intersection  problem  on  sets  as  follows: 
each  player  learns  the  number  of  unique  elements  in  5i  n  •  •  •  n  5,1,  without  learning  any  other 
information.  A  variant  of  this  problem  is  the  Cardinality  Set-Intersection  problem  on  multisets, 
which  we  define  as  follows:  all  players  learn  jS*!  n  •  •  •  n  as  computed  on  multisets. 

Our  protocol  for  Cardinality  Set-Intersection,  given  in  Figure  4.3,  proceeds  as  our  protocol 
for  Set-Intersection,  until  the  point  where  all  players  learn  the  encryption  of  p,  the  polynomial 
representation  of  5i  n  •  •  •  n  5^.  Each  player  i  =  1, . . . ,  n  then  evaluates  this  encrypted  poly¬ 
nomial  at  each  unique  element  a  G  Si,  obtaining  /?„,  an  encryption  of  p{a).  He  then  blinds 
each  encrypted  evaluation  p{a)  by  calculating  =  ba  Xp  Pa-  All  players  then  distribute  and 
shuffle  the  ciphertexts  constructed  by  each  player,  such  that  all  players  receive  all  cipher- 
texts,  without  learning  their  source.  The  Shuffle  protocol  can  be  constructed  from  standard 
techniques  [13,  17,  24,  31,  44],  with  communication  complexity  at  most  0{rPk).  The  players 
then  decrypt  these  ciphertexts,  finding  that  nb  of  the  decryptions  are  0,  implying  that  there 
are  b  unique  elements  in  5i  n  •  •  •  n  Sn-  FNP  utilize  a  variation  of  this  technique  [23],  but  it 
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is  not  obvious  how  to  construct  a  multiparty  Cardinality  Set-Intersection  protocol  from  their 
techniques. 


Variants.  Our  protocol  can  be  simply  extended  to  privately  compute  the  Cardinality  Set- 
Intersection  problem  on  multisets,  by  utilizing  an  encoding  as  follows:  any  element  a  that 
appears  b  times  in  a  multiset  is  encoded  as  the  set:  {a  ||  1, . . .  ,a  ||  6},  with  element  included 
only  once.  Note  that  this  is  a  set  of  equivalent  size  as  the  original  multiset  representation,  so 
this  variant  preserves  the  efficiency  of  our  protocol. 


Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  size  of  the 
answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains 
information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal  statement 
of  these  properties  is  as  follows: 

Theorem  8.  In  the  Cardinality  Set-Intersection  protocol  of  Fig.  4-3,  every  player  learns  the 
size  of  the  intersection  of  all  players’  private  inputs,  |5i  n  52  n  •  •  •  n  with  overwhelming 
probability. 

Proof.  Note  that,  following  the  proof  of  Theorem  6,  p  is  a  polynomial  representation  of  the 
intersection  multiset,  with  overwhelming  probability.  Each  player  evaluates  p  (encrypted)  at 
each  of  their  inputs,  then  blinds  it  by  homomorphically  multiplying  a  random  element  by  the 
encrypted  evaluation.  Thus  each  resulting  encrypted  element  (ViJj  {1  <  i  <  n,  1  <  j  <  k)  is 
either  0,  representing  some  element  of  a  private  input  set  in  the  intersection  set,  or  uniformly 
distributed,  representing  some  element  not  in  the  intersection  set.  An  element  is  a  member  of 
5in-  •  •n5n  if  and  only  if  each  player  holds  it  as  part  of  their  private  input  set,  for  each  element  of 
5in-  •  -CiSn,  there  are  n  encrypted  evaluations  that  are  0.  Thus,  when  the  encrypted  evaluations 
(Vi)j  {i.  <  i  <  n,  1  <  j  <  k)  are  shuffled  and  decrypted,  there  are  exactly  n|5i  n  •  •  •  n  5^1  Os, 
and  thus  all  players  learn  the  size  of  the  intersection  set.  □ 

Theorem  9.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epi~{-)  is  se¬ 
mantically  secure  and  that  the  Shuffle  protocol  is  secure,  with  overwhelming  probability,  in  the 
Cardinality  Set-Intersection  protocol  of  Fig.  4-3,  any  coalition  of  fewer  than  n  PPT  honest-but- 
curious  players  learns  no  more  information  than  would  be  gained  by  using  the  same  private 
inputs  in  the  ideal  model  with  a  trusted  third  party. 

Proof.  We  assume  that  the  cryptosystem  Epkf)  and  Shuffle  protocol  are  secure,  so  we  may 
note  that  no  player  or  coalition  of  players  learns  any  information  from  the  protocol  except  the 
decryption  of  the  randomly-ordered  set  {(Vi)j}ie[n]je[fc]-  each  element  of  that  set  is  either  0 
or  a  uniformly  distributed  element,  it  conveys  no  information  other  than  the  statement  ‘some 
player  had  an  element  in  their  private  input  set  that  was/ was  not  in  the  intersection  set’.  As 
this  information  precisely  constitutes  the  result  of  the  Cardinality  Set-Intersection  problem,  no 
additional  information  is  revealed.  □ 

4.2.3  Malicious  Case 

We  can  extend  our  protocols  in  Figures  4.2  and  4.3,  secure  against  honest-but-curious  players, 
to  protocols  secure  against  malicious  adversaries  by  adding  zero-knowledge  proofs  or  using 
cut-and-choose  to  ensure  security.  We  give  details  of  our  protocols  secure  against  malicious 
adversaries  in  Section  4.4.2. 
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Protocol:  Over-Threshold  Set-Union-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key 
for  a  homomorphic  cryptosystem.  The  threshold  number  of  repetitions  at  which  an  element  appears  in  the 
output  is  t.  Fo, . . . ,  Ft-i  are  fixed  polynomials  of  degree  0, . . . ,  t  —  1  which  have  no  common  factors  or  roots 
representing  elements  of  P. 

Output:  Each  player  determines  Rdt_i(S'i  U  •  •  •  U  Sn) 

1.  Each  player  i  =  1, ...  ,n  calculates  the  polynomial  fi  =  (x  —  {Si)i)  . . .  (x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  /i  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algorithm  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  -|-  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  11"=!  players  2, . . . ,  c  -I-  1 

5.  Each  player  i  =  1, . . . ,  c  -I-  1 

(a)  calculates  the  encryption  of  the  1, ..,  t  —  1st  derivatives  of  p,  denoted  p^^\  . . . ,  hy  repeating 

the  algorithm  given  in  Sec.  4.1.2. 

(b)  chooses  random  polynomials  rip, . . . ,  Vi^t-i  <—  R”'’‘[x\ 

(c)  calculates  the  encryption  of  the  polynomial  *Ft*  Vi^i  and  sends  it  to  all  other  players. 

6.  All  players  perform  a  group  decryption  to  obtain  the  polynomial  4>  =  *  Fj  *  (X^i^o 

7.  Each  player  i  =  1, . . . ,  n,  for  each  j  =  1, . . . ,  fc 

(a)  chooses  a  random  element  bij  ^  R 

(b)  calculates  Uij  =  Fj  x  <l?((Si)j)  +  {Si)j 

8.  All  players  i  =  1, . .  .n  perform  the  Shuffle  protocol  on  the  elements  Uij  (1  <  j'  <  k),  such  that  each 
player  obtains  a  joint  set  V. 

Each  element  a  €  P  that  appears  b  times  in  V  is  an  element  in  the  threshold  set  that  appears  b  times  in  the 
players’  private  inputs. 

Figure  4.4:  Over-Threshold  Set-Union  protocol  secure  against  honest-but-curious  adversaries. 

4.3  Application  II:  Private  Over-Threshold  Set-Union  and 
Threshold  Set-Union 

In  this  section,  we  design  protocols  for  the  Over-Threshold  Set-Union  problem  and  several 
variations  of  the  Threshold  Set-Union  problem,  secure  against  a  coalition  of  honest-but-curious 
adversaries. 


4.3.1  Over-Threshold  Set-Union  Protocol 

Problem  Definition.  Let  there  be  n  players;  each  has  a  private  input  set  Si  {1  <  i  fZ  n) 
of  size  k.  We  define  the  Over- Threshold  Set- Union  problem  as  follows:  all  players  learn  which 
elements  appear  in  the  union  of  the  players’  private  input  multisets  at  least  a  threshold  number 
t  times,  and  the  number  of  times  these  elements  appeared  in  the  union  of  players’  private  inputs, 
without  gaining  any  other  information.  For  example,  assume  that  a  appears  in  the  combined 
private  input  of  the  players  15  times.  If  t  =  10,  then  all  players  learn  a  has  appeared  15  times. 
However,  if  t  =  16,  then  no  player  learns  a  appears  in  any  player’s  private  input.  This  problem 
can  be  represented  as  Rdt_i(S'i  U  •  •  •  U  5^). 

We  describe  our  protocol  secure  against  honest-but-curious  players  for  the  Over-Threshold 
Set-Union  problem  in  Fig.  4.4.  In  this  protocol,  each  player  i  (1  <  i  <  n)  first  calculates  fi,  the 
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polynomial  representation  of  its  input  multiset  Si.  All  players  then  compute  the  encryption  of 
polynomial  p  =  0^=1  polynomial  representation  of  5i  U  •  •  •  U  5n.  Players  i  =  1, . . . ,  c  + 

1  then  each  choose  random  polynomials  ri^, . . .  ,ri^t-i,  and  calculate  the  encryption  of  the 
polynomial  *  Fi*  as  shown  in  Fig.  4.4.  All  players  then  calculate  the  encryption 

of  the  polynomial  *  Fg*  perform  a  group  decryption  to  obtain 

As  at  most  c  players  may  dishonestly  collude,  the  polynomials  (1  <  ^  <  d)  are 

uniformly  distributed  and  known  to  no  player.  By  Theorem  5,  is  a  polynomial  representation 
of  Rdt_i(5i  U  •  •  •  U  Sn). 

Each  player  i  =  1, . . .  ,n  then  chooses  bij  <—  R  and  computes  Uij  =  bij  x  4>((5i)j)  +  {Si)j 
(1  <  j  <  k).  Each  element  Uij  equals  {Si)j  if  {Si)j  £  Rdt_i(5i  U  •  •  •  U  Sn),  and  is  otherwise 
uniformly  distributed  over  R.  The  players  then  shuffle  these  elements  Uij,  such  that  each  player 
learns  all  of  the  elements,  but  does  not  learn  which  player’s  set  they  came  from.  The  shuffle 
can  be  easily  accomplished  with  standard  techniques  [13,  17,  24,  31,  44],  with  communication 
complexity  at  most  0{v?k).  The  multiset  formed  by  those  shuffled  elements  that  represent 
elements  of  P  is  Rdi_i(5i  U  •  •  •  U  Sn). 


Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appro¬ 
priate  answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player 
gains  information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model  with  a  trusted 
third  party.  A  formal  statement  of  these  properties  is  as  follows: 

Theorem  10.  In  the  Over- Threshold  Set-Union  protoeol  of  Fig.  f.f,  every  honest-but-eurious 
player  learns  eaeh  element  a  whieh  appears  at  least  t  times  in  the  union  of  the  n  players  ’  private 
inputs,  as  well  as  the  number  of  times  it  so  appears,  with  overwhelming  probability. 

Proof.  All  players  calculate  and  decrypt  *  Fg  *  As  Xliii  (0  < 

I  <  t  —  1)  are  distributed  uniformly  over  all  polynomials  of  approximate  size  nk  and,  by  the 
reasoning  of  Lemma  19,  all  coefficients  of  p^^'^  *  Fg  (0  <  i  <  t  —  1)  are  in  the  set  R*  U  {0}, 
Lemma  2  tells  us  that  =  gcd  ‘^),...,p)  *  u,  where  «  is  a  random  polynomial  of  the 

appropriate  size.  As  u  has  only  a  polynomial  number  of  roots,  each  of  which  has  a  negligable 
probability  of  representing  a  member  of  P,  u  is  a  polynomial  representation  of  the  empty  set 
with  overwhelming  probability. 

By  Theorem  4,  gcd  p^^~‘^\  . . .  ,p)  has  roots  which  are  exactly  those  that  appear  at 

least  t  times  in  the  players’  private  inputs  (the  threshold  set).  The  players  calculate  elements 
Ui^j,  which  are  uniformly  distributed  if  {Si)j  is  not  a  member  of  the  threshold  set,  and  {Si)j  if 
it  does  appear  in  the  threshold  set.  These  elements  are  shuffled  and  distributed  to  all  players. 
Each  reveals  an  element  of  the  private  input,  if  that  element  is  in  the  threshold  set,  and  nothing 
otherwise.  Thus  each  element  in  the  threshold  intersection  set  is  revealed  as  many  times  as  it 
appeared  in  the  private  inputs.  □ 

Theorem  11.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is 
semantieally  seeure,  with  overwhelming  probability,  in  the  Over-Threshold  Set-Union  protoeol 
of  Fig.  4-4,  coalition  of  fewer  than  n  PPT  honest-but-eurious  players  learns  no  more  infor¬ 
mation  than  would  be  gained  by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted 
third  party. 

Proof.  We  assume  that  the  cryptosystem  employed  is  semantically  secure,  and  so  play¬ 
ers  learn  only  the  formula  *  Fg  *  Note  that  (0  — 
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I  <  t  —  1)  are  uniformly  distributed  and  unknown  to  all  players,  as  the  maximum  coali¬ 
tion  size  is  smaller  than  c  -|-  1.  Note  that  by  the  reasoning  of  Lemma  19,  all  coeffi¬ 
cients  of  *^£(0  <  i  <  t  —  1)  are  in  the  set  R*  U  {0}.  Thus,  by  Theorem  2, 

=  gcd  *  Ft-i  *  Ft-2,  ■ . .  ,p*  -Fq)  *  s,  for  some  uniformly  distributed  polynomial 

s.  As  s  is  uniformly  distributed  for  any  player  inputs,  no  player  or  coalition  can  learn  more  than 
gcd  are  chosen  such  that  gcd(p,  Fq,  ,  Ft-i)  =  1  with  over¬ 
whelming  probability,  and  so  gcd  *  Ft-i,p^^~‘^^  *  Ft-2,  ■  ■  ■  ,P*  Fq)  =  gcd  [p,p^^~^^  *  F)  = 

gcd  . . .  ,p)  with  overwhelming  probability.  As  was  observed  in  Theorem  10,  this 

information  exactly  represents  the  threshold  set,  and  can  thus  be  derived  from  the  answer  that 
would  be  returned  by  a  trusted  third  party.  Thus  no  player  or  coalition  of  at  most  c  players 
can  learn  more  than  in  the  ideal  model. 

Neither  do  the  shuffled  elements  reveal  additional  information.  As  we  assume  the  shuffling 
protocol  is  secure,  the  origin  of  any  element  is  not  revealed.  The  elements  revealed  are  exactly 
those  in  the  threshold  set,  each  included  as  many  times  as  it  was  included  in  the  private  inputs, 
and  thus  also  do  not  reveal  information  to  any  adversary.  □ 

4.3.2  Threshold  Set-Union 

Problem  Definition.  We  define  the  Threshold  Set-Union  problem  as  follows:  all  players 
learn  which  elements  appear  in  the  combined  private  input  of  the  players  at  least  a  threshold 
number  t  times.  For  example,  assume  that  a  appears  in  the  combined  private  input  of  the 
players  15  times.  If  t  =  10,  then  all  players  learn  a.  However,  if  t  =  16,  then  no  player  learns 
a.  This  problem  differs  from  the  Over-Threshold  Set-Union  problem  in  that  each  player  learns 
the  elements  of  Rdi_i(5i  n  •  •  •  n  Sn),  without  learning  how  often  each  element  appears. 

We  offer  protocols  for  several  variants  on  Threshold  Set-Union:  threshold  contribution, 
perfect,  and  semi-perfect.  Threshold  contribution  allows  for  thresholds  t  >  1,  and  each  player 
learns  only  those  elements  which  appear  both  in  his  private  input  and  the  threshold  set:  player 
i  (I  <  i  <  n)  learns  the  elements  of  S*  n  Rdi_i(S'i  (1  ■  ■  ■  (1  Sn)-  Perfect  threshold  set-intersection 
allows  for  thresholds  t  >  1,  and  conforms  exactly  to  the  definition  of  threshold  set-intersection. 
The  semi-perfect  variant  requires  for  security  that  t  >  2,  and  that  the  cheating  coalition  does 
not  include  any  single  element  more  than  t  —  1  times  in  their  private  inputs.  Note  that  the 
information  illicitly  gained  by  the  coalition  when  they  include  more  than  t  —  1  copies  of  an 
element  a  is  restricted  to  a  possibility  of  learning  that  there  exists  some  other  player  whose 
private  input  contains  a.  We  do  not  consider  the  difference  in  security  between  the  semi-perfect 
and  perfect  variants  to  be  significant. 

The  protocols  for  the  Threshold  Set-Union  problem,  given  in  Figs.  4.5,  4.6,  and  4.7,  are 
identical  to  the  protocol  for  Over-Threshold  Set-Union  (given  in  Fig.  4.4)  from  step  1-5.  We 
explain  the  differences  between  the  protocols  for  each  variant:  threshold  contribution,  semi¬ 
perfect,  and  perfect.  Each  player  constructs  encryptions  of  the  elements  4>((S'i)j)  from  his 
private  input  set  in  step  6,  and  continues  as  described  below. 


Threshold  Contribution  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  4.6.  The 

players  cooperatively  decrypt  the  encrypted  elements  4>((5i)j)  *  This  decryption 

must  take  place  in  such  a  way  that  only  player  i  learns  the  element  ^{{Si)j)  * 

Typically,  parties  produce  decryption  shares  and  reconstruct  the  element  from  them;  player  i 
simply  retains  his  decryption  share,  so  that  only  he  learns  the  decryption.  Thus  each  player 
learns  which  of  his  elements  appear  in  the  threshold  set,  since  if  {Si)j  appears  in  the  threshold 
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Protocol:  Threshold-SemiPerfect-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key 
for  a  homomorphic  cryptosystem.  The  threshold  number  of  repetitions  at  which  an  element  appears  in  the 
output  is  t.  Fo,  ■  ■  ■ ,  Ft-i  are  fixed  polynomials  of  degree  0, . . . ,  t  —  1  which  have  no  common  factors  or  roots 
representing  elements  of  P. 

Output:  Each  player  learns  the  elements  of  Rdt_i(5'i  U  •  •  •  U  Sn.)- 

1.  Each  player  i  =  1, ...  ,n  calculates  the  polynomial  fi  =  (x  —  (Si)i) . . .  (x  —  (S'i)fe) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  /i  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algorithm  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  A„  =  11”=!  /*  players  2, . . . ,  c  +  1 

5.  Each  player  i  =  1, . . . ,  c  +  1 

(a)  calculates  the  encryption  of  the  1, ..,  t  —  1st  derivatives  of  p,  denoted  p^^^ , . . .  by  repeating 

the  algorithm  given  in  Sec.  4.1.2. 

(b)  chooses  random  polynomials  rip, . . . ,  ri,t_i  <—  R"’^[x\ 

(c)  calculates  the  encryption  of  the  polynomial  *Ft*  n^i  and  sends  it  to  all  other  players. 

6.  Each  player  i  =  1, . . .  ,n 

(a)  evaluates  the  encryption  of  the  polynomial  4>  =  *  Ft  *  (X^iii  s-t  each  input 

{Si)j,  obtaining  encrypted  elements  Epu{cij)  where  dj  —  using  the  algorithm  given 

in  Sec.  4.1.2 

(b)  for  each  j  =  1, . . .  ,k  calculates  an  encrypted  tag  Tij  =  EnCi{h{{Si)j)  ||  {Si)j) 

(c)  for  each  j  =  1, ...  ,k  chooses  a  random  number  Vij  ^  R  and  calculates  an  encrypted  element 

Pij  —  Epk{Cij)')  -\-h  Epk{{Si^j') 

(d)  constructs  the  set  Vi  =  {{Tij  ||  Uij)  \  1  <  j  <  fc} 

7.  By  using  the  Shuffle  protocol,  players  perform  shuffling  on  their  private  input  sets  Vi. 

8.  For  each  shuffled  element  T  \\  U  in  sorted  order,  each  player  i  =  1, ...  ,n 

(a)  if  Di{T)  =  h{a)  ||  a  for  some  a 

i.  if  a  has  previously  been  revealed  to  be  in  the  threshold  set,  then  calculate  an  incorrect 
decryption  share  of  U ,  and  send  it  to  all  other  players 

(b)  else  calculate  a  decryption  share  of  U,  and  send  it  to  all  other  players 

(c)  reconstruct  the  decryption  of  U.  If  the  element  a  £  P,  then  a  is  in  the  threshold  result  set 

Figure  4.5:  Threshold  Set-Union  protocol  secure  against  honest-but-curious  adversaries  (semi¬ 
perfect  variant). 


set,  4>((5j)j)  *  —  0-  player  learns  more  information  because  if  an  element  {Si)j 

is  not  in  the  threshold  set,  4>((5j)j)  *  is  uniformly  distributed. 


Semi-Perfect  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  4.5.  The  encrypted 
element  {Ui)j  calculated  from  the  encrypted  evaluation  of  4>((5i)j)  is  either:  (1)  an  encryption 
of  the  private  input  element  {Si)j  (if  {Si)j  is  in  the  intersection  set)  or  (2)  an  encryption  of  a 
random  element  (otherwise).  However,  the  player  also  constructs  a  corresponding  encrypted 
tag  for  each  {Ui)j,  Tij.  We  require  that  the  cryptosystem  used  to  construct  these  tags  be  key- 
private,  so  that  the  origin  of  ciphertext  pairs  T,  U  cannot  be  ascertained  by  the  key  used  to 
construct  the  tags. 

The  players  then  correctly  obtain  a  decryption  of  each  element  in  the  threshold  set  exactly 
once.  Any  other  time  a  ciphertext  U  for  an  element  in  the  threshold  set  is  decrypted,  a  player 
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Protocol:  Threshold-Contribution-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key 
for  a  homomorphic  cryptosystem.  The  threshold  number  of  repetitions  at  which  an  element  appears  in  the 
output  is  t.  F  is  a  fixed  polynomial  of  degree  t  —  1  which  has  no  roots  representing  elements  of  P.  The 
threshold  number  of  repetitions  at  which  an  element  appears  in  the  output  is  t  >  2.  Fo, . . . ,  Ti-i  are  fixed 
polynomials  of  degree  0, ...  ,t  —  1  which  have  no  common  factors  or  roots  representing  elements  of  P. 
Output:  Each  player  i  {1  <  i  <  n)  determines  Si  Pi  Rdt_i(5'i  U  •  •  •  U  Sn). 

1.  Each  player  i  =  1, ...  ,n  calculates  the  polynomial  fi  =  (x  —  {Si)i)  ...  (a;  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  /i  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algorithm  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  11"=!  players  2, . . . ,  c  +  1 

5.  Each  player  i  =  1, . . . ,  c  +  1 

(a)  calculates  the  encryption  of  the  1, ..,  t  —  1st  derivatives  of  p,  denoted  p^^\  . . . ,  hy  repeating 

the  algorithm  given  in  Sec.  4.1.2. 

(b)  chooses  random  polynomials  rip, . . . ,  Vi^t-i  ^  R”'’‘[x\ 

(c)  calculates  the  encryption  of  the  polynomial  *Ft*  ri^i  and  sends  it  to  all  other  players. 

6.  Each  player  i  =  1, . . . ,  n 

(a)  evaluates  the  encryption  of  the  polynomial  >1>  =  *  Fi  *  ri_^)  at  each  input 

{Si)j,  obtaining  encrypted  elements  Epk{cij)  where  dj  —  ${{Si)j),  using  the  algorithm  given 
in  Sec.  4.1.2,  and  sends  them  to  all  players 

(b)  chooses  a  random  element  (1  <  J  <  R,  1  <  ^  <  fe) 

(c)  for  each  ciphertext  cje,  calculate  &i,j,<  Xh  cje  (1  <  j'  <  n,  1  <  i  <  k) 

7.  The  players  i  {1  <  i  <  n)  calculate  Ujm  =  (5Z"=i  Cjm  (1  <  j  <  u-,  1  <  m  <  fc) 

8.  All  players  decrypt  the  ciphertexts  Uij,  so  that  only  player  i  learns  the  decryption  aij. 

For  each  player  i  (1  <  i  <  n),  if  aij  =  0  (1  <  j  <  fc),  then  {Si)j  is  in  his  result  set. 

Figure  4.6:  Threshold  Set-Union  protocol  secure  against  honest-but-curious  adversaries  (thresh¬ 
old  contribution  variant). 


sabotages  it.  In  group  decryption  schemes,  players  generally  produce  shares  of  the  decrypted 
element;  if  one  player  sends  a  uniformly  generated  share  instead  of  a  valid  one,  the  decrypted 
element  is  uniform.  If  the  decrypted  element  is  uniform,  it  conveys  no  information  to  the 
players.  To  ensure  an  encryption  of  an  element  in  the  threshold  set  is  not  decrypted  once 
the  element  is  known  to  be  in  the  threshold  set,  a  player  sabotages  the  decryption  under  the 
following  conditions:  (1)  he  can  decrypt  the  tag  to  h{a)  ||  a  for  some  a  and  (2)  a  has  already 
been  determined  to  be  a  member  of  the  threshold  set.  All  other  ciphertexts  should  be  correctly 
decrypted;  either  they  are  encryptions  of  elements  in  the  threshold  set  which  have  not  yet  been 
decrypted,  or  they  are  encryptions  of  random  elements. 

Note  that  the  protocol  is  the  only  protocol  proposed  in  this  chapter  with  a  non-constant 
number  of  rounds.  Because  of  the  need  to  sabotage  decryptions  based  on  the  results  of  past 
decryptions,  there  are  0{nk)  rounds  in  this  protocol. 


Perfect  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  4.7.  Each  player  constructs 
the  encrypted  elements  {Ui)j  from  the  encrypted  evaluation  of  4>((5i)j)  as  written  in  step  6 
of  Figure  4.5.  The  players  then  utilize  the  Shuffle  protocol  to  anonymously  distribute  these 
elements.  If  an  element  appears  in  the  threshold  set,  then  at  least  one  encryption  of  it  appears 
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Protocol:  Threshold-Perfect-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with  a  private  input 
set  Si,  such  that  \Si\  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public 
key  for  a  homomorphic  cryptosystem.  The  threshold  number  of  repetitions  at  which  an  element  appears 
in  the  output  is  t.  P  is  a  fixed  polynomial  of  degree  t  —  1  which  has  no  roots  representing  elements  of  P. 
The  threshold  number  of  repetitions  at  which  an  element  appears  in  the  output  is  t  >  2.  Fo, . . .  ,Ft-i  are 
fixed  polynomials  of  degree  0, . . . ,  t  —  1  which  have  no  common  factors  or  roots  representing  elements  of  P. 
IsEq(C',  C')  =  1  if  the  ciphertexts  C,  C'  encode  the  same  plaintext,  and  0  otherwise. 

Output:  Each  player  determines  the  elements  of  Rdt_i(S'i  U  •  •  •  U  S'„). 

1.  Each  player  i  =  1, . . . ,  n  calculates  the  polynomial  fi  =  (x  —  {Si)i) . . .  (x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  fi  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Xi-i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algorithm  given  in 
Sec.  4.1.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  Iir^i  players  2, . . . ,  c  +  1 

5.  Each  player  i  =  1, . . . ,  c  +  1 

(a)  calculates  the  encryption  of  the  1, ..,  t  —  1st  derivatives  of  p,  denoted  p^^\  . . . ,  hy  repeating 

the  algorithm  given  in  Sec.  4.1.2. 

(b)  chooses  random  polynomials  rip, . . . ,  ri,t_i  <—  R"’^[x\ 

(c)  calculates  the  encryption  of  the  polynomial  *Fi*  Xi^t  and  sends  it  to  all  other  players. 

6.  Each  player  i  =  1, . . .  ,n 

(a)  evaluates  the  encryption  of  the  polynomial  >1>  =  *  Ft  *  (X^i^i  ^i,^)  a-t  each  input 

{Si)j,  obtaining  encrypted  elements  Epk{cij)  where  dj  —  <f>((S'i)j),  using  the  algorithm  given 
in  Sec.  4.1.2,  and  sends  them  to  all  players 

(b)  for  each  i'  =  l,...,n,  j  =  l,...,k  chooses  a  random  number  Ti/j  ^  R  and  calculates  an 
encrypted  element  Uij  =  (ri'j  Xh  Epkici'j)),  and  sends  it  to  player  i' 

(c)  calculates  the  elements  for  j  =  1, . . .  ,k 

Uij  ~  i'^lj  h  Epk.^Ctj'))  ~\~h  ...  ~\~h  Xh  Epki^Cnj}^  h  Epk(^(^Si^ 

(d)  constructs  the  set  Vi  =  {Uij  \  1  <  i  <  fc} 

7.  By  using  the  Shuffle  protocol,  all  players  perform  shuffling  on  their  private  input  sets  Vi,  obtaining 
the  set  U' . 

8.  For  each  shuffled  ciphertext  Ut  with  arbitrary  ordering  index  £  £  [nk],  the  players  i  =  I, . . .  ,n 

(a)  each  player  i  chooses  random  elements  qi^t  <—  R 

(b)  calculate  Wt  =  +h  Epk  ((Er=i  9^/)  (IsEq([/;,  U{_,)  +  •  •  •  +  IsEq(t/;,  [/{))) 

9.  All  players  1, . . . ,  n  decrypt  each  ciphertext  Wt,  obtaining  an  element  at  (1  <  £  <  nk). 

If  aj  £  E  (1  <  J  <  k),  then  Uj  is  a  member  of  the  result  set. 

Figure  4.7:  Threshold  Set-Union  protocol  secure  against  honest-but-curious  adversaries  (perfect 
variant). 


in  the  shuffled  ciphertexts.  The  players  ensure  in  step  8  that  all  duplicates  (ciphertexts  of 
the  same  element)  except  the  first  have  a  random  element  added  to  them.  This  disguises  the 
number  of  players  who  have  each  element  of  the  threshold  set  in  their  private  input.  Let  the 
shuffled  ciphertexts  U  have  an  arbitrary  ordering  t/(, . . . ,  IsEq(C',  C)  =  1  if  the  ciphertexts 
C  encode  the  same  plaintext,  and  0  otherwise.  (This  calculation  can  be  achieved  with  the 
techniques  in  [36].)  The  players  i  G  [n]  then  choose  random  elements  qi^i  <—  i?  (1  <  i  <  nk)  and 
decrypt  the  ciphertexts  -\-h  Epk  ((EILi  qe)  (IsEq(U;,  U'_,)  IsEq(t/;,  U{))) .  Thus, 

if  is  a  duplicate  (encryption  of  an  element  which  also  appeared  early  in  the  ordering),  it  has 
a  uniformly  distributed  element  added  to  it,  and  conveys  no  information.  Each  element  of  the 
threshold  set  is  decrypted  exactly  once,  and  all  players  thus  learn  the  threshold  set. 
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Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appropri¬ 
ate  result  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains 
information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal  statement 
of  these  properties  is  as  follows: 

Theorem  12.  In  the  Threshold  Contribution  Threshold  Set-Union  protoeol  of  Fig.  4.6,  every 
player  i  (1  <  i  <  n)  learns  the  set  Si  n  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Note  that  the  encrypted  computation  is  performed  in  accordance  with  Theorems  3  and  5, 
and  thus  the  polynomial  $  is  a  polynomial  representation  of  the  multiset  Rdt_i(S'i  U  •  •  •  U  Sn), 
with  overwhelming  probability.  Each  player  i  (1  <  i  <  n)  constructs  encrypted  evaluations  of 
each  a  G  Si,  which  are  them  homomorphically  multiplied  by  a  uniformly  distributed  element 
by  all  players.  Thus,  each  ciphertext  constructed  in  this  fashion  is  either  0  (meaning  a  G 
Rdi_i(5i  U  •  •  •  U  Sn))  or  uniformly  distributed  (meaning  a  0  Rdt_i(5i  U  •  •  •  U  Sn))-  These 
ciphertexts  are  then  decrypted;  thus,  each  player  i  learns  which  elements  of  his  private  input 
appear  in  the  threshold  set  Rdt_i(5i  U  •  •  •  U  5^),  with  overwhelming  probability.  □ 

Theorem  13.  In  the  Semi-Perfeet  Threshold  Set-Union  protoeol  of  Fig.  4-5,  eaeh  player  i 
(I  <  i  <  n)  learns  the  set  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Following  the  proof  of  Theorem  12,  the  polynomial  is  a  polynomial  representation 
of  the  multiset  Rdt_i(5i  U  •  •  •  U  Sn),  with  overwhelming  probability  and  each  shuffled  element 
T  II  [/  is  of  one  of  the  following  forms: 

•  For  some  a  G  S'!  U  •  •  •  U  Sn,  1  <  i  <  n,  T  =  Enci{h{a)  ||  a),  U  is  of  the  form  Epk{a)  - 
thus,  a  G  Rdi_i(5i  U  •  •  •  U  Sn) 

•  For  some  a  G  5i  U  •  •  •  U  Sn,  1  <  i  <  n,  T  =  Enci{h{a)  1 1  a),  17  is  not  of  the  form  Epk{a)  - 
thus,  a  0  Rdi_i(5i  U  •  •  •  U  Sn) 


The  operation  of  Step  8  assures  that  for  each  a  G  Rdt_i(5i  U  •••  U  Sn),  a  corresponding 
U  is  correctly  decrypted  exactly  once  -  all  other  decryptions  of  a  are  sabotaged  to  appear 
uniformly  distributed.  Thus,  all  players  learn  the  elements  of  the  set  Rdt-i(5i  U  •  •  •  U  Sn),  with 
overwhelming  probability.  □ 

Theorem  14.  In  the  Perfeet  Threshold  Set-Union  protoeol  of  Fig.  4-7,  every  player  learns  the 
set  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Following  the  proof  of  Theorem  12,  the  polynomial  <1>  is  a  polynomial  representation  of 
the  multiset  Rdt-i(5'i  U  •  •  •  U  Sn),  with  overwhelming  probability  and  each  shuffled  (encrypted) 
element  U’^  {1  <  i  <  nk)  is  of  one  of  the  following  forms:  a  G  P  (indicating  that  a  G  Rdi_i(5i  U 
•  •  -CSn)),  or  a  uniformly  distributed  element  (which  can  be  distinguished  from  a  representation 
of  an  element  of  P  with  overwhelming  probability).  Note  that,  if  U'^  is  an  encryption  of  an 
element  a,  and  -'3£/g[£_i]  Up  such  that  Up  is  also  an  encryption  of  a,  then  is  also  an 
encryption  of  a.  (Otherwise,  is  an  encryption  of  a  uniformly  distributed  element.) 

This  calculation  results  in  a  list  of  encrypted  elements  W£,  each  of  which  is  of  one  of  the 
following  forms:  a  G  P  (indicating  that  both:  a  G  Rdt-i(5i  U  •  •  •  U  Sn),  and  is  with  over¬ 
whelming  probability  the  only  encryption  of  a  in  the  list),  or  a  uniformly  distributed  element. 
Thus,  when  the  players  decrypt  the  list  W£,  they  learn  all  elements  of  Rdt_i(5i  U  •  •  •  U  Sn) 
exactly  once,  with  overwhelming  probability.  □ 
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Theorem  15.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epi~{-)  is 
semantically  secure  and  that  the  Shujfle  protocol  is  secure,  with  overwhelming  probability,  in  the 
Threshold  Set-Union  protocols  of  Figs.  4-5,  4-6,  and  4-7,  any  coalition  of  fewer  than  n  PPT 
honest-but- curious  players  learns  no  more  information  than  would  be  gained  by  using  the  same 
private  inputs  in  the  ideal  model  with  a  trusted  third  party. 

Proof.  Note  that  in  the  threshold  contribution  and  perfect  variants  of  Threshold  Set-Union,  all 
data  is  encrypted  until  the  final  result  sets  are  revealed  through  joint  decryption.  As  shown  in 
Theorems  12  and  14,  the  final  sets  correspond  exactly  to  the  elements  revealed  (all  elements 
that  are  not  in  the  result  set  are  uniformly  distributed  over  R,  and  thus  hold  no  information), 
no  information  except  the  result  set  is  revealed  to  the  players. 

In  the  protocol  for  semi-perfect  Threshold  Set-Union,  the  result  set  is  not  decrypted  all- 
at-once,  but  one  element  at  a  time.  Theorem  13  shows  the  the  resulting  elements  correspond 
exactly  to  the  desired  result  set,  but  we  must  show  that  the  behavior  of  each  player  during  the 
process  of  decryption  yields  no  disallowed  information.  Note  that  we  require  for  the  security  of 
this  protocol  that  a  dishonest  coalition  hold  no  more  than  t  —  1  copies  of  any  given  element  in 
their  private  input  sets. 

When  performing  the  decryption  process,  each  player  learns  two  pieces  of  information  when 
a  result  set  element  is  revealed:  the  element,  and  whether  the  element  revealed  came  from  that 
player’s  own  private  input  multiset.  Each  ciphertext  is  ‘tagged’,  so  each  player  can  easily  decide 
whether  they  constructed  that  ciphertext.  Thus,  if  a  dishonest  coalition  held  at  least  t  copies 
of  any  given  element,  they  could  determine  that  at  least  one  other  player  also  held  a  copy  of 
that  element,  revealing  forbidden  information.  However,  as  we  have  precluded  this  situation,  no 
information  is  revealed;  if  a  dishonest  coalition  holds  t  —  1  copies  of  an  element  which  appears 
in  the  result  set,  they  already  know  that  at  least  one  other  player  holds  it  (otherwise  it  would 
not  appear  in  the  result  set!).  □ 

4.3.3  Malicious  Case 

By  adding  zero-knowledge  proofs  to  our  Over-Threshold  Set-Union  protocol  secure  against 
honest-but-curious  adversaries,  we  extend  our  results  to  enable  security  against  malicious  adver¬ 
saries.  We  provide  details  of  our  protocol  secure  against  malicious  adversaries  in  Section  4.4.4. 


4.4  Set-Intersection,  Cardinality  Set-Intersection,  and  Over- 
Threshold  Set-Union  for  Malicious  Parties 

We  extend  the  protocols  for  the  Set-Intersection,  Cardinality  Set-Intersection,  and  Over- 
Threshold  Set-Union  problems  given  in  Sections  4.2  and  4.3  to  obtain  security  against  ad¬ 
versaries  in  the  malicious  model.  To  obtain  this  result,  we  add  zero-knowledge  proofs,  verified 
by  all  players,  to  ensure  the  correctness  of  all  computation.  In  this  section,  we  first  introduce 
notation  for  zero-knowledge  proofs,  then  give  the  protocols  secure  against  malicious  parties. 

4.4.1  Tools 

In  this  section,  we  describe  cryptographic  tools  that  we  utilize  in  our  protocols  secure  against 
malicious  players. 
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Zero-Knowledge  Proofs.  We  utilize  several  zero-knowledge  proofs  in  our  protocols  for  the 
malicious  adversary  model.  We  introduce  the  notation  for  these  zero-knowledge  proofs  below; 
for  additively  homomorphic  cryptosystems  such  as  Paillier,  we  can  efficiently  construct  these 
zero-knowledge  proofs  using  standard  constructions  [9,  14]. 

•  POPK{£'pfc(x)}  denotes  a  zero-knowledge  proof  that  given  a  public  ciphertext  Epk{x), 
the  player  knows  the  corresponding  plaintext  x  [15]. 

•  ZKPK{/  \  p'  =  f  *h  Cl}  is  shorthand  notation  for  a  zero-knowledge  proof  of  knowledge 
that  the  prover  knows  a  polynomial  /  such  that  encrypted  polynomial  p'  =  f  *h  ci,  given 
the  encrypted  polynomials  p'  and  a. 

•  ZKPK{/  I  (p'  =  /  a)  A{y  =  Epk  (/)) }  is  the  proof  ZKPK{/  \  p'  =  f  a}  with  the 
additional  constraint  that  y  =  Epk{f)  (y  is  the  encryption  of  /),  given  the  encrypted 
polynomial  p',  y,  and  a. 


Equivocal  Commitment.  A  standard  commitment  scheme  allows  parties  to  give  a  “sealed 
envelope”  that  can  be  later  opened  to  reveal  exactly  one  value.  We  use  an  equivocal  commitment 
scheme  in  our  protocols  secure  against  malicious  players,  such  that  the  simulator  can  open  the 
‘envelope’  to  an  arbitrary  value  without  being  detected  by  the  adversary  [33,  39]. 

4.4.2  Set-Intersection  Protocol  for  Malicious  Adversaries 

Our  protocol  for  malicious  parties  performing  Set-Intersection,  given  in  Fig.  4.8,  proceeds  largely 
as  the  protocol  secure  against  honest-but-curious  parties,  which  was  given  in  Fig.  4.2.  The 
commitments  to  the  data  items  A(cij)  are  purely  for  the  purposes  of  a  simulation  proof.  We 
add  zero-knowledge  proofs  to  prevent  three  forms  of  misbehavior:  choosing  ciphertexts  for  the 
encrypted  coefficients  of  fi  without  knowledge  of  their  plaintext,  not  performing  the  polynomial 
multiplication  of  fj  *  rij  correctly,  and  not  performing  decryption  correctly.  We  also  constrain 
the  leading  coefficient  of  fi  to  be  1  for  all  players,  to  prevent  any  player  from  setting  their 
polynomial  to  0;  if  /*  =  0,  every  element  is  a  root,  and  thus  it  can  represent  an  unlimited 
number  of  elements.  We  can  thus  detect  or  prevent  misbehavior  from  malicious  players,  forcing 
this  protocol  to  operate  like  the  honest-but-curious  protocol  in  Fig.  4.2.  The  protocol  can  gain 
efficiency  by  taking  advantage  of  the  maximum  coalition  size  c. 

Our  set-intersection  protocol  secure  against  malicious  parties  utilizes  an  expensive  {0{k‘^) 
size)  zero-knowledge  proof  to  prevent  malicious  parties  from  cheating  when  multiplying  the 
polynomial  rij  by  the  encryption  of  the  polynomial  fj.  Each  player  i  must  commit  to  each 
polynomial  (1  <  i,j  <  n),  for  purposes  of  constructing  a  zero- knowledge  proof.  We  may 
easily  replace  this  proof  with  use  of  the  cut-and-choose  technique,  which  requires  only  0{k) 
communication. 


Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  F  and  the  ideal  world, 
where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  F  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer 
set  can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  16.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is 
semantieally  seeure,  and  the  speeified  zero-knowledge  proofs  and  proofs  of  eorreet  deeryption 
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Protocol:  Set-Intersection-Mal 

Input:  There  are  n  >  2  players,  each  with  a  private  input  set  Si,  such  that  ISil  =  k.  The  players  share  the 
secret  key  sk,  to  which  pk  is  the  corresponding  public  key  to  a  homomorpic  cryptosystem.  The  commitment 
scheme  used  in  this  protocol  is  a  equivocal  commitment  scheme;  each  player  holds  any  additional  inputs 
necessary  for  this  scheme,  such  as  a  common  reference  string. 

Output:  Each  player  determines  Si  n  •  •  •  n  S„. 

All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  stop  participating  in  the  protocol  if  any  are 
not  correct. 

Each  player  i  =  1, . . . ,  n: 

1.  (a)  calculates  the  polynomial  fi  such  that  the  k  roots  of  the  polynomial  are  the  elements  of  Si,  as 

/,  =  {x-{Si)i)...{x-{Si)k) 

(b)  sends  5i,  the  encryption  of  the  polynomial  fi  to  all  other  players  along  with  proofs  of  plaintext 
knowledge  for  all  coefficients  except  the  leading  coefficient  (POPK{((5i)j},  0  <  j  <  k). 

(c)  for  1  <  j  <  n 

i.  chooses  a  random  polynomial  Vij  ^  R’^lx] 

ii.  sends  a  commitment  to  A{rij)  to  all  players,  where  A{rij)  =  Epki^ij) 

2.  for  1  <  J  <  n 

(a)  opens  the  commitment  to  A{rij) 

(b)  verifies  proofs  of  plaintext  knowledge  for  the  encrypted  coefficients  of  fj 

(c)  sets  the  leading  encrypted  coefficient  (for  x’^)  to  a  known  encryption  of  1 

(d)  calculates  p,  the  encryption  of  the  polynomial  pij  =  fj  *rij  with  proofs  of  correct  multiplication 
ZKPK{rij-  I  {p  =  Vij  *h  Sj)  A  (A(rij)  =  Epk  ('>'i,j)) }  and  sends  it  to  all  other  players 

3.  All  players 

(a)  calculate  the  encryption  of  the  polynomial  p  =  JUILi  ^j=i  as  in  Sec.  4.1.2, 

and  verihes  all  attached  proofs 

(b)  perform  a  group  decryption  to  obtain  the  polynomial  p,  and  distribute  proofs  of  correct  decryp¬ 
tion 

Each  player  i  =  I, ...  ,n  determines  the  intersection  multiset  as  follows:  for  each  a  £  Si,  he  calculates  b  such 
that  (x  —  a)^\p  A  (x  —  fp.  The  element  a  appears  b  times  in  the  intersection  multiset. 

Figure  4.8:  Set-Intersection  protocol  secure  against  malicious  adversaries. 


cannot  be  forged,  then  in  the  Set-Intersection  protocol  secure  against  malicious  adversaries  in 
Fig.  4.8,  for  any  coalition  T  of  colluding  players  (at  most  n  —  1  such  colluding  parties),  there  is 
a  player  (or  group  of  players)  G  operating  in  the  ideal  model,  such  that  the  views  of  the  players 
in  the  ideal  model  is  computationally  indistinguishable  from  the  views  of  the  honest  players  and 
r  in  the  real  model. 


Proof.  In  this  simulation  proof,  we  give  an  algorithm  for  a  player  G  in  the  ideal  model.  This 
player  communicates  with  the  malicious  players  F,  pretending  to  be  one  or  more  honest  players 
in  such  a  fashion  that  F  cannot  distinguish  that  he  is  not  in  the  real  world.  We  assume  that  all 
malicious  players  can  collude.  The  trusted  third  party  takes  the  input  from  G  and  the  honest 
parties,  and  gives  both  G  and  the  honest  parties  the  intersection  set.  G  then  communicates 
with  the  malicious  players  F,  so  they  also  learn  the  intersection  set.  A  graphical  representation 
of  these  players  is  given  in  Figure  4.9 

We  give  a  sketch  of  how  the  player  G  operates  (note  that  G  can  prevaricate  when  opening 
commitments,  as  we  use  an  equivocal  commitment  scheme,  and  can  extract  plaintext  from 
proofs  of  plaintext  knowledge): 

1.  For  each  simulated  honest  player  i,  G: 
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Figure  4.9:  A  simulation  proof  defines  the  behavior  of  the  player  G,  who  translates  between 
the  malicious  players  F,  who  believe  they  are  operating  in  the  real  model,  and  the  ideal  model, 
in  which  the  trusted  third  party  computes  the  desired  answer. 


(a)  chooses  a  polynomial  fi  such  that  each  such  polynomial  is  relatively  prime  and  has 
leading  coefficient  1  (for  randomly  generated  polynomials  with  leading  coefficient  1, 
this  is  true  with  overwhelming  probability) 

(b)  chooses  arbitrary  polynomials  . . . ,  Vi^n  and  creates  encryptions  A(rij)  from  them 
(in  the  case  of  Paillier,  specially  construct  encryptions  of  those  polynomials,  and 
proofs  of  knowledge  of  each  coefficient,  see  Section  4.4.1) 

2.  Performs  step  1  of  the  protocol: 

(a)  sends  the  encryption  of  fi  to  all  malicious  players  F,  along  with  proofs  of  plaintext 
knowledge  and  commitments  to  A(rij)  (1  <  j  <  n) 

(b)  sends  data  items  A{rij)  (1  <  j  <  n)  to  all  malicious  players  F 

(c)  Receives  from  each  malicious  player  a  G  F: 

i.  encryption  of  a  polynomial  fa  and  proofs  of  plaintext  knowledge  for  its  coeffi¬ 
cients 

ii.  trapdoor  commitments  to  data  items  A(raj)  for  each  random  polynomial  Vaj, 
^  <  j  <  n 

3.  The  player  G  extracts  from  the  proofs  of  plaintext  knowledge  and  trapdoor  commitments 
to  A(rij)  (in  the  case  of  Paillier,  the  extraction  is  from  the  proof  of  knowledge  of  the 
discrete  logarithm),  the  polynomials  fa,  and  the  random  polynomials  ra,j  the  malicious 
players  F  have  chosen. 

4.  G  obtains  the  roots  of  each  polynomial  fa  (as  these  exactly  determine,  for  the  purposes 
of  the  protocol,  his  set): 

•  If  polynomial  factoring  is  possible,  G  may  factor  fa-  fa{a)  =  0  (x  —  a)\fa,  so  all 
roots  of  fa  may  be  determined  by  examining  the  linear  factors. 
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•  If  we  are  working  in  the  random  oracle  model,  then,  with  overwhelming  probability, 
to  correctly  represent  any  element  of  the  valid  set  P,  a  player  must  consult  the 
random  oracle.  As  there  can  be  only  a  polynomial  number  of  such  queries,  for  each 
query  a,  G  may  check  if  fa{a  ||  h{a))  =  0. 

•  If  neither  of  these  routes  are  feasible,  then  a  proof  that  fa  was  constructed  by 
multiplying  k  linear  factors  of  the  form  x  —  a  may  be  added  to  the  protocol  instead 
of  proofs  of  plaintext  knowledge.  This  proof  is  of  size  0{k^),  and  is  constructed  by 
using  proofs  of  plaintext  knowledge  for  some  linear  factors,  and  layering  proofs  of 
correct  multiplication  to  obtain  the  complete  polynomial  fa-  From  this  proof,  each 
linear  factor  of  fa  can  be  obtained,  and  thus  all  roots  of  fa- 

5.  G  submits  the  sets  represented  by  these  roots  to  the  trusted  third  party.  The  honest 
players  submit  their  private  input  sets  to  the  trusted  third  party.  The  trusted  third  party 
returns  the  intersection  set  /  to  G  and  the  honest  players. 

6.  G  prepares  to  reveal  the  intersection  set  to  the  malicious  players  T: 

(a)  selects  a  target  polynomial  p  =  (n  ~  ®))  *  '®)  where  s  is  chosen  uniformly 

from  those  polynomials  of  degree  2k  —  |/|.  (note  that,  by  Lemma  2,  this  is  exactly 

the  polynomial  calculated  by  simply  running  the  protocol,  as  by  the  reasoning  of 
Lemma  19,  all  coefficients  of  /*  (1  <  i  <  n)  are  in  the  set  R*  U  {0}.) 

(b)  chooses  a  set  of  polynomials  (where  i  is  one  of  the  simulated  honest  players) 

such  that  fi  (Ej=i  =  p  (from  the  proof  of  Lemma  2,  we  know  that  such 

polynomials  exist,  and  can  be  determined  through  simple  polynomial  manipulation) 

7.  G  follows  the  rest  of  the  protocol  with  the  malicious  players  T  as  written,  except  that  he 
opens  the  trapdoor  commitment  to  reveal  an  appropriate  A(rjj)  for  the  new  chosen  nj- 
In  this  way,  the  players  calculate  an  encryption  of  the  polynomial  p  chosen  by  G,  and 
then  decrypt  it.  The  coalition  players  thus  learn  the  intersection  set. 

Note  that  the  dishonest  players  cannot  distinguish  that  they  are  talking  to  G  (who  is  working 
in  the  ideal  model)  instead  of  other  clients  (in  the  real  world),  and  the  correct  answer  is  learned 
by  all  parties,  in  both  the  real  and  ideal  models. 

□ 


4.4.3  Cardinality  Set-Intersection  Protocol  for  Malicious  Adversaries 

We  give  a  protocol,  secure  against  malicious  parties,  to  perform  Cardinality  Set-Intersection  in 
Fig.  4.10.  It  proceeds  largely  as  the  protocol  secure  against  honest-but-curious  parties,  which 
was  given  in  Fig.  4.3.  The  commitments  to  the  data  items  A(rjj)  are  purely  for  the  purposes 
of  a  simulation  proof.  We  add  zero-knowledge  proofs  of  knowledge  to  prevent  five  forms  of 
misbehavior:  choosing  fi  without  knowledge  of  its  roots,  choosing  fi  such  that  it  is  not  the 
product  of  linear  factors,  not  performing  the  polynomial  multiplication  of  fj  *  rjj-  correctly, 
not  calculating  encrypted  elements  (V))j  correctly  (either  not  from  the  data  items  {Si)j  or  not 
evaluating  the  encrypted  polynomial  p) ,  and  not  performing  decryption  correctly.  We  can  thus 
detect  or  prevent  misbehavior  from  malicious  players,  forcing  this  protocol  to  operate  like  the 
honest-but-curious  protocol  in  Fig.  4.3. 


Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  F  and  the  ideal  world. 
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Protocol:  Cardinality-Mal 

Input:  There  are  n  >  2  players,  each  with  a  private  input  set  Si,  such  that  |Si|  =  k.  The  players  share  the 
secret  key  sk,  to  which  pk  is  the  corresponding  public  key  to  a  homomorpic  cryptosystem.  The  commitment 
scheme  used  in  this  protocol  is  a  equivocal  commitment  scheme. 

Output:  Each  player  determines  l^i  n  •  •  •  n  S'n|. 

All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  stop  participating  in  the  protocol  if  any  are 
not  correct. 

Each  player  i  =  1, . . . ,  n: 

1.  (a)  calculates  the  polynomial  fi  such  that  the  k  roots  of  the  polynomial  are  the  elements  of  Si,  as 

/.  =  {x-{Si)i)...{x-iSi)k) 

(b)  sends: 

i.  encrypted  elements  yip  =  Epk{{Si)i), . . . ,  yip  =  Epk{{Si)k)  to  all  other  players,  along  with 
proofs  of  plaintext  knowledge  (POPK{£'pfe(i/ij)},  1  <  j  <  fe) 


ii.  sends  Si,  the  encryption  of  the  polynomial  fi  to  all  other  players,  along  with  a  proof  of  cor- 


1 

Ti  =  ((x  -  ai)  *h 

...  {x-  ak-i)  (y-)  \ 

rect  construction  ZKPK< 

for  1  <  j  <  n 

^ai, . . 

■  •  > 

A  yi,l  —  EpklOjfi  A  •  • 
A  a  =  Epk{x  —  ak) 

'  ^  Vijk  ~  Epki^dk')  j 

i.  chooses  a  random  polynomial  Xij  ^  R’^lx] 

ii.  sends  a  commitment  to  A{rij)  to  all  players,  where  A{rij)  =  Epkifij) 


2.  for  1  <  j  <  n 

(a)  opens  the  commitment  to  A{rij) 

(b)  verifies  proofs  of  plaintext  knowledge  for  the  encrypted  coefficients  of  fj 

(c)  sets  the  leading  encrypted  coefficient  (for  x’")  to  a  known  encryption  of  1 

(d)  calculates  nj,  the  encryption  of  the  polynomial  pij  =  fj  *  Vij,  with  proofs  of  correct  multipli¬ 
cation  ZKPK{rij  I  {nj  =  Vij  *h  Sj)  A  (A(rij)  =  Epk  {rij))  }  and  sends  it  to  all  other  players 

3.  Each  player  i  =  1, . . .  ,n: 

(a)  calculates  p.,  the  encryption  of  the  polynomial  p  =  Sj=i  Phii  s-®  in  4.1.2,  and  verifies 
all  attached  proofs 

(b)  evaluates  the  encryption  of  the  polynomial  p  at  each  input  {Si)j,  obtaining  encrypted  elements 
Epk{cij)  where  dj  =  PiiSi)j),  using  the  algorithm  given  in  Sec.  4.1.2. 

(c)  for  each  j  £  [k]  chooses  a  random  element  Xij,  calculates  an  encrypted  el¬ 

ement  {Vi)j  =  Vij  Xh  Epk{cij),  with  attached  proof  of  correct  construction 
ZKPK{(rij,z)  I  {{Vi)j=rij  Xh  p{z))  A  {yip  =  Epk{z))},  and  sends  the  encrypted  element 
{Vi)j  and  the  proof  of  correct  construction  to  all  players 

4.  All  players  perform  the  Shuffle  protocol  on  the  sets  Vi,  obtaining  a  joint  set  V ,  in  which  all  ciphertexts 
have  been  re-randomized. 

5.  All  players  1, . . . ,  n  decrypt  each  element  of  the  shuffled  set  V  (and  send  proofs  of  correct  decryption 
to  all  other  players) 

If  nb  of  the  decrypted  elements  from  V  are  0,  then  the  size  of  the  set  intersection  is  b. 

Figure  4.10:  Cardinality  set-intersection  protocol  secure  against  malicious  adversaries. 

where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  F  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer 
set  can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  17.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epi~{-)  is 
semantically  secure,  the  Shuffle  protocol  is  secure,  and  the  specified  zero-knowledge  proofs  and 
proofs  of  correct  decryption  cannot  he  forged,  then  in  the  Cardinality  Set-Intersection  protocol 
secure  against  malicious  adversaries  in  Fig.  4-10,  for  any  coalition  F  of  colluding  players  (at 
most  n  —  1  such  colluding  parties),  there  is  a  player  (or  group  of  players)  G  operating  in  the  ideal 
model,  such  that  the  views  of  the  players  in  the  ideal  model  is  computationally  indistinguishable 
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from  the  views  of  the  honest  players  and  T  in  the  real  model. 


Proof.  The  simulation  proof  of  this  theorem  follows  the  proof  of  Theorem  16  with  only  small 
changes;  the  additional  zero- knowledge  proofs  in  the  protocol  are  generally  irrelevant  to  the 
operation  of  the  simulator.  □ 

4.4.4  Over-Threshold  Set-Union  Protocol  for  Malicious  Adversaries 

We  give  a  protocol,  secure  against  malicious  parties,  to  perform  Over-Threshold  Set-Union  in 
Fig.  4.11.  It  proceeds  largely  as  the  protocol  secure  against  honest-but-curious  parties,  which 
was  given  in  Fig.  4.4.  The  commitments  to  the  data  items  A(rjj)  are  purely  for  the  purposes 
of  a  simulation  proof.  We  add  zero-knowledge  proofs  of  knowledge  to  prevent  six  forms  of 
misbehavior:  choosing  /j  without  knowledge  of  its  roots,  choosing  /j  such  that  it  is  not  the 
product  of  linear  factors,  not  performing  the  polynomial  multiplication  of  fj  *  Xj-i  correctly, 
not  calculating  ai^£  =  *  ri^£  {0  <  £  <  t  —  1)  correctly,  not  calculating  encrypted  elements 

{Vi)j  correctly  (either  not  from  the  data  items  {Si)j  or  not  evaluating  the  encrypted  polyno¬ 
mial  4>),  and  not  performing  decryption  correctly.  We  can  thus  detect  or  prevent  misbehavior 
from  malicious  players,  forcing  this  protocol  to  operate  like  the  honest-but-curious  protocol  in 
Fig.  4.4. 


Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  F  and  the  ideal  world, 
where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  F  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer 
set  can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  18.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is 
semantieally  seeure,  the  Shuffle  protoeol  is  seeure,  and  the  speeified  zero-knowledge  proofs  and 
proofs  of  eorreet  deeryption  eannot  he  forged,  then  in  the  Over-Threshold  Set-Union  protoeol 
seeure  against  malieious  adversaries  in  Fig.  4-10,  for  any  eoalition  F  of  eolluding  players  (at 
most  n  —  1  sueh  eolluding  parties),  there  is  a  player  (or  group  of  players)  G  operating  in  the  ideal 
model,  sueh  that  the  views  of  the  players  in  the  ideal  model  is  eomputationally  indistinguishable 
from  the  views  of  the  honest  players  and  F  in  the  real  model. 

Proof.  In  this  simulation  proof,  we  give  an  algorithm  for  a  player  G  in  the  ideal  model.  This 
player  communicates  with  the  malicious  players  F,  pretending  to  be  one  or  more  honest  players 
in  such  a  fashion  that  F  cannot  distinguish  that  he  is  not  in  the  real  world.  We  assume  that  all 
malicious  players  can  collude.  The  trusted  third  party  takes  the  input  from  G  and  the  honest 
parties,  and  gives  both  G  and  the  honest  parties  the  intersection  set.  G  then  communicates 
with  the  malicious  players  F,  so  they  also  learn  the  intersection  set.  A  graphical  representation 
of  these  players  is  given  in  Figure  4.9. 

We  give  a  sketch  of  how  the  player  G  operates  (note  that  G  can  prevaricate  when  opening 
commitments,  as  we  use  an  equivocal  commitment  scheme,  and  can  extract  plaintext  from 
proofs  of  plaintext  knowledge): 

1.  For  each  simulated  honest  player  i,  G: 

(a)  chooses  a  set  5'  of  arbitrary  elements  (S’')!, . . . ,  (5'()a:  G  R 
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Protocol:  OverThreshold-Mal 

Input:  There  are  n  >  2  players,  c  <  n  maliciously  colluding,  each  with  a  private  input  set  Si,  such  that 
ISil  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key  to  a  homomorpic 
cryptosystem.  The  commitment  scheme  used  in  this  protocol  is  a  equivocal  commitment  scheme.  The 
threshold  number  of  repetitions  at  which  an  element  appears  in  the  output  is  t.  Fq,  ■  ■  ■ ,  Ft-i  are  fixed 
polynomials  of  degree  0, . . . ,  t  —  1  which  have  no  common  factors  or  roots  representing  elements  of  P. 
Output:  Each  player  determines  Rdt_i(S'i  U  •  •  •  U  S„). 

All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  refuse  to  participate  in  the  protocol  if  any 
are  not  correct. 

Each  player  i  = 


1.  Each  player  i  =  1, . . . ,  n  calculates  the  polynomial  fi  =  (x  —  {Si)i)  . . .  (x  —  (Si)k) 

2.  Players  1, . . . ,  c+  1  send  commitments  to  pip, . . . ,  pi^k  to  all  players,  where  pij  =  Epk{{Si)j)  {1  <  j  < 
k).  All  players  then  open  these  commitments. 

3.  Player  1  sends  to  all  other  players:  encrypted  elements  yi.i  =  Epk{{S\)i), . . .  ,pi^k  =  Epk{{Si)k), 
along  with  proofs  of  plaintext  knowledge  (POPK{i?pfe(i/ij)},  1  <  j  <  k);  ri,  the  encryp¬ 
tion  of  the  polynomial  Ai  =  /i  to  all  other  players,  along  with  a  proof  of  correct  construction 


1 

r 

XI  =  ((a:  -  ai)  *h  ■■■  *h  (x  -  Uk-i)  *h  a)  I 

ZKPK.^ 

1 

(2l ,  .  .  .  ,  Cl}^ 

A  pi, 1  =  Epk{ai) /\  ■■■  A  pi, k  =  Epk{ak)  / 

1 

A  a  =  Epk{x  —  ak)  J 

4.  Each  player  i  =  2, . 

. .  ,n 

(a) 

(b) 


receives  Xi,  the  encryption  of  the  polynomial  Ai_i,  from  player  i  —  I 

sends  to  all  other  players:  encrypted  elements  pij  =  Epk{{Si)i), . . .  ,pi^k  =  Epk{{Si)k), 
along  with  proofs  of  plaintext  knowledge  (POPK{i5pfc(yij)},  1  <  J  <  k);  Xi,  the  en¬ 
cryption  of  the  polynomial  \i  =  fi  *  Ai_i,  along  with  a  proof  of  correct  construction 


ZKPK-|^ai, . . .  ,ak 


A 


Xi  =  {{x  -  ai)  ■■■  *h  {x  -  ak)  *h  Ti_i) 
Pi,l  ~  Epkip^l)  f\  •  •  •  f\  Pi,k  ~  Epk((Xk^ 


5.  Each  player  i  =  1, . . . ,  c  -I-  1 

(a)  choose  random  polynomials  ri,o,  •  •  •  ,ri,t-i  ^  R'°[x] 

(b)  for  £  =  0, . . . ,  t  —  1,  calculate  ai  the  encryption  of  the  £th  derivative  of  p  =  A„,  denoted  ,  by 
repeating  the  algorithm  given  in  Sec.  4.1.2. 

(c)  calculate  ai^i,  the  encryption  of  the  polynomial  p^  *  Xi^e,  for  0  <  £  <  t  —  1  and 

send  them  to  all  other  players,  along  with  proofs  of  correct  polynomial  multiplication, 
ZKPK|ri/  I  ai,e  =  n/  *h  p^^^  | 

6.  Each  player  i  =  1, . . . ,  n: 

(a)  calculates  p,  the  encryption  of  the  polynomial  <1?  =  4.1.2, 

and  verifies  all  attached  proofs 

(b)  evaluates  the  encryption  of  the  polynomial  <1?  at  each  input  {Si)j,  obtaining  encrypted  elements 
Epk{cij)  where  cp  =  PiiSi)j),  using  the  algorithm  given  in  Sec.  4.1.2. 

(c)  for  each  j  £  [A:]  chooses  a  random  element  Xij  <—  R,  calculates  an  encrypted  ele¬ 
ment  (Vi)j  =  {xij  Xh  Epk{cij))  +  {Si)j,  with  attached  proof  of  correct  construction 
ZKPK{(rij,z)  I  ((Vi)j  =  {xij  Xh  p{z))  +  z)  f\  {pij  =  Epk{z))},  and  sends  the  encrypted  ele¬ 
ment  (Vi)j  and  the  proof  of  correct  construction  to  all  players 


7.  All  players  perform  the  Shuffle  protocol  on  the  sets  Vi,  obtaining  a  joint  set  V,  in  which  all  ciphertexts 
have  been  re-randomized,  then  jointly  decrypt  each  element  of  the  shuffled  set  V  (and  send  proofs  of 
correct  decryption  to  all  other  players). 


Each  element  a  €  P  that  appears  b  times  in  V  is  an  element  in  the  threshold  set  that  appears  b  times  in  the 
players’  private  inputs. 


Figure  4.11:  Over-threshold  set-intersection  protocol  secure  against  malicious  adversaries. 


(b)  Performs  steps  1  —  2  of  the  protocol,  sending  equivocal  commitments  to  the  set  Si 
for  each  simulated  honest  player. 

2.  The  player  G  extracts  the  private  input  sets  chosen  by  F,  for  each  malicious  player,  from 
the  equivocal  commitments  sent  in  step  2  of  the  protocol.  G  submits  the  sets  extracted 
from  these  commitments  to  the  trusted  third  party.  The  honest  players  submit  their 
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private  input  sets  to  the  trusted  third  party.  The  trusted  third  party  returns  the  result 
set  /  to  G  and  the  honest  players. 

3.  G  prepares  to  reveal  the  intersection  set  to  the  malicious  players  T:  G  chooses  new  sets 
Si  to  replace  the  sets  5'  used  to  construct  the  commitment.  These  sets  are  chosen  to 
contain  the  following  elements: 

(a)  for  each  element  a  that  appears  6  >  0  in  /,  and  6r  times  in  the  private  input  multisets 
of  the  malicious  players  (T),  the  element  a  is  included  b  +  t  —  1  —  br  times  in  the 
multisets  Si 

(b)  all  elements  not  specified  by  the  prior  rule  are  chosen  uniformly  from  R 

4.  G  follows  the  rest  of  the  protocol  with  the  malicious  players  T  as  written.  The  coalition 
players  thus  learn  the  result  set. 

Note  that  the  dishonest  players  cannot  distinguish  that  they  are  talking  to  G  (who  is  working 
in  the  ideal  model)  instead  of  other  clients  (in  the  real  world),  and  the  correct  answer  is  learned 
by  all  parties,  in  both  the  real  and  ideal  models. 

□ 

4.5  Other  Applications  of  Our  Multiset  Computation  Tech¬ 
niques 

Our  techniques  for  privacy-preserving  computation  of  multiset  operations  have  wide  applicabil¬ 
ity  beyond  the  protocols  we  discuss  in  Sections  4.2  and  4.3.  We  first  discuss  the  composition  of 
our  techniques  to  compute  arbitrary  functions  based  on  the  intersection,  union,  and  reduction 
operators.  We  also  propose  an  efficient  method  for  the  Subset  problem,  determining  whether 
Acs.  As  an  example  of  the  application  of  our  techniques  to  problems  outside  the  realm  of 
set  computation,  we  describe  their  use  in  evaluation  of  boolean  formulae. 

4.5.1  General  Computation  on  Multisets 

Our  techniques  for  privacy-preserving  multiset  operations  can  be  arbitrarily  composed  to  enable 
a  wide  range  of  privacy-preserving  multiset  computations.  In  particular,  we  give  a  grammar 
describing  functions  on  multisets  that  can  be  efficiently  computed  using  our  privacy-preserving 
operations: 

T  ::=  s  I  Rdd(T)  |TnT|sUT|TUs, 

where  s  represents  any  multiset  held  by  some  player,  and  d  >  1.  Note  that  any  monotone 
function  on  multisets  can  be  expressed  using  the  grammar  above,  and  thus  our  techniques  for 
privacy-preserving  multiset  operations  are  truly  general. 

It  is  worth  noting  that  the  above  grammar  only  allows  computation  of  the  union  operator 
when  at  least  one  of  the  two  operands  is  a  multiset  known  to  some  player.  Although  any 
monotone  function  on  multisets  can  be  described  by  our  grammar,  in  some  cases  it  is  desirable 
(or  more  efficient)  to  enable  the  calculation  of  the  union  operator  on  two  multisets  calculated 
from  other  multiset  operations,  such  that  neither  operand  is  known  to  any  player.  In  this 
case,  we  could  calculate  the  union  operation  in  the  following  way.  Let  A  and  Epk{f)  be  the 
encrypted  polynomial  representations  of  the  two  multisets.  The  players  use  standard  techniques 
to  privately  obtain  additive  shares  of  /,  given  Epk{f).  Using  these  shares,  they 
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then  calculate  (/i  *h  +h  ■  ■  ■  +h  {fu  A)  =  /  *h  A,  the  encryption  of  the  polynomial 
representation  of  the  union  multiset. 

4.5.2  Private  Subset  Relation 

Problem  Statement  Let  the  set  A  be  held  by  Alice.  The  set  B  may  be  the  result  of  an 
arbitrary  function  over  multiple  players’  input  sets  (for  example  as  calculated  using  the  grammar 
above).  The  Subset  problem  is  to  determine  whether  A  C  B  without  revealing  any  additional 
information. 

Let  A  be  the  encryption  of  the  polynomial  p  representing  B.  Note  that  A  O  B  p{o)  = 

0.  Alice  thus  evaluates  the  encrypted  polynomial  A  at  each  element  a  G  A,  homomorphically 
multiplies  a  random  element  by  each  encrypted  evaluation,  and  adds  these  blinded  ciphertexts 
to  obtain  (3' .  If  (3'  is  an  encryption  of  0,  then  A  'G  B.  More  formally: 

1.  For  each  element  a  =  Aj  (1  <  j  <  |A|),  the  player  holding  A: 

(a)  calculates  f3j  =  A(a) 

(b)  chooses  a  random  element  bj  <—  R,  and  calculates  /3'  =  bj  Xh  Pj 

2.  The  player  holding  A  calculates  /?'  =  P[  +h  ■  ■  ■  +/i 

3.  All  players  together  decrypt  P'  to  obtain  y.  If  y  =  0,  then  A  B. 

This  protocol  may  be  simply  extended  to  allow  the  set  A  to  be  held  by  multiple  players,  such 
that  A  =  Ai  U  •  •  •  U  Ajy,  where  each  set  Aj  is  held  by  a  single  player. 

4.5.3  Computation  of  CNF  Formulae 

Finally,  we  show  that  our  techniques  on  private  multiset  operations  have  applications  outside 
of  the  realm  of  multiset  computations.  As  a  concrete  example,  we  show  that  we  can  apply 
our  techniques  to  efficient  privacy-preserving  evaluation  of  boolean  formulae,  in  particular,  the 
conjunctive  normal  form  (CNF).  A  formula  in  CNF  is  a  conjunction  of  a  number  of  disjunctive 
clauses,  each  of  which  is  formed  of  several  variables  (or  their  negations). 


Problem  Statement  Let  phe  a  public  CNF  boolean  formula  on  variables  Vi, . . . ,  Vk-  Each 
player  knows  the  truth  assignment  to  some  subset  of  {Vi, . . . ,  Vj^},  where  each  variable  is  known 
to  at  least  one  player.  The  players  cooperatively  calculate  the  truth  value  of  p  under  this 
assignment,  without  revealing  any  other  information  about  the  variable  assignment. 

We  address  this  problem  by  introducing  multiset  representations  of  boolean  formulae.  Let 
True  and  False  be  distinct  elements  of  R  (e.g.,  0  and  1).  For  each  variable  in  the  formula,  let 
the  multiset  representation  of  the  variable  be  {  True}  if  its  value  is  true,  and  {  False}  if  its 
value  is  false.  Then,  replace  each  V  operator  in  p  with  a  U  operator,  and  each  A  operator  with 
a  n  operator.  If  True  is  a  member  of  the  resulting  multiset,  then  p  is  true.  The  polynomial 
multiset  representation  of  the  CNF  formula  can  now  be  evaluated  by  the  players  through  use  of 
our  privacy- preserving  multiset  operations,  as  the  function  is  described  in  the  grammar  given 
in  Section  4.5.1. 

We  can  also  solve  many  variations  on  boolean  formula  evaluation  using  our  techniques.  For 
example,  we  might  require,  instead  of  using  the  boolean  operations,  that  at  least  t  of  the 
variables  in  a  clause  be  satisfied.  Note  that  using  our  techniques  can  be  more  efficient  than 
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standard  multiparty  techniques,  as  standard  techniques  require  an  expensive  multiplication 
operation,  involving  all  players,  to  compute  the  A  operator  [5,  28]. 


4.6  Proof  of  Mathematical  Lemmas 

In  this  section,  we  prove  Lemmas  2  and  4,  as  well  as  several  lemmas  on  which  these  proofs 
depend. 


4.6.1  Proof  of  Lemma  2 

Lemma  19.  Let  the  ring  R  have  subfields  T\  and  7^2-  Define  the  gcd  of  two  polynomials 
f,g^  R[x]  as  the  output  of  Euelid’s  algorithm  for  eomputing  the  greatest  eommon  denominator; 
if  Euelid’s  algorithm  fails,  the  gcd  is  undefined. 

Any  PPT  adversary  who  ean  obtain  (with  non-negligible  probability)  two  polynomials  for  whieh 
the  gcd  is  undefined  ean  determine  the  size  of  the  subfields  of  R  (with  non-negligible  probability) . 


Proof.  If  the  leading  coefficient  of  a  polynomial  p  is  in  R*,  then  for  any  polynomial  b  G  R[x], 
there  exist  unique  polynomials  q,r  such  that  p  =  q  *  b  +  r  (deg(r)  <  deg(6))  [51].  Note 
that  this  is  the  sole  calculation  necessary  to  compute  the  Euclidean  gcd  algorithm,  and  that 
this  algorithm  runs  in  PPT.  Thus,  if  this  algorithm  fails  to  compute  gcd(/,  5^),  it  must  have 
calculated  some  polynomial  fi  as  an  intermediate  result  such  that  the  leading  coefficient  of  fi 
is  in  i?  \  (R*  U  {0}).  The  elements  of  i?  \  {R*  U  {0})  are  those  without  a  multiplicitive  inverse: 
multiples  of  \E(\  (!<!'<  2).  Thus,  the  polynomial  p  that  causes  Euclid’s  algorithm  to  fail  must 
have  a  leading  coefficient  of  a  multiple  of  the  size  of  some  sub-field  (1  <  f  <  2).  Given  this 
coefficient,  one  can  compute  JJ^i],  I.T2I  in  probabilistic  polynomial  time  by  using  the  Euclidean 
algorithm  over  integers.  As,  by  assumption  in  Section  3.2.1,  this  problem  is  hard  over  our  ring 
R,  we  can  (with  overwhelming  probability)  compute  gcd(/,  5)  using  Euclid’s  algorithm.  □ 

Remark  20.  Por  all  a,b  G  R,  if  a  G  R* ,  there  exists  no  element  c  G  R  (c  b)  sueh  that 
ab  =  ac. 

Lemma  21.  Eor  all  polynomials  f,g  G  R[x]  sueh  that  the  leading  eoeffieient  of  f  is  a  member 
of  R* ,  there  exists  no  polynomial  y  G  R[x]  sueh  that  f*g  =  f*y,  gfiy. 


Proof.  Eor  two  polynomials  to  be  equal,  each  of  their  coefficients  must  be  equal.  Thus,  we  may 
express  the  condition  f*g  =  f*yas  follows  for  each  >  0: 


if  *9)[^] 
if  *y)[f\ 
E/[^-i]  i9[j]-y[j]) 

j=0 


E/[« 


j=o 


£ 


j=0 


j]9[j] 

j]y[j] 


0 
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We  prove  by  induction  that  g[l]  =  y[^]  (0  <  £  <  deg((/)).  As  a  base  case,  we  prove  that 
g[(ieg{g)]  =  y[deg(c/)]: 

deg(/)+deg(g) 

f[des{f)  +  <ieg{g)-j]{g[j]-y[j])  =  0 

j=0 

/[deg(/)](5[deg(3)] -y[deg(5r)])  =  0 

Because  /[deg(/)]  G  R*  (by  definition,  /[deg(/)]  /  0),  by  Lemma  20  g[deg{g)]  =  y[deg{g)]. 

We  now  make  the  strong  inductive  assumption  that  for  i  <  £  <  deg(5(),  g[£]  =  y[£].  Next,  we 
use  this  assumption  to  prove  that  g[i  —  1]  =  y[i  —  1]: 

deg(/)+i-l 

/[deg(/)  +  f  -  1  -  j]  {g[j]  -  y[j])  =  0 

j=0 

deg(/)+i-l 

/[deg(/)  +  f-l-j](5[j]-yb1)  =  f[deg{f)]{g[i-l]-y[i-l]) 

j=0 

/[deg(/)](5[i-l] -2/[*-l])  =  0 

Because  /[deg(/)]  G  R*  (by  definition,  /[deg(/)]  /  0),  by  Lemma  20  g[i  —  1]  =  y[i  —  1].  Thus, 
by  the  inductive  principle,  all  coefficients  of  g  are  identical  to  those  of  y  up  to  deg(5r).  We  now 
prove  that  deg(y)  <  deg{g),  showing  that  y  =  g  (and  thus  that  our  lemma  is  true). 

If  deg{y)  >  deg(g')  then  y[£]  /  0  (let  £  be  the  minimal  such  index): 

g[£]  =  0 

s'M  -  y[^]  /  0 

i 

-y[j])  =  0 

j=0 

We  may  remove  from  the  sum  all  terms  for  which  g[j]  —  y[j]  =  0,  leaving  us  with  the  following 
equation  for  some  i  <  deg(/): 


/[deg(/)]  (^M -yM)  =  0 

Because  /[deg(/)]  G  R*  (by  definition,  /[deg(/)]  /  0),  by  Lemma  20  y[£]  =  g[£]  =  0.  Thus,  no 
such  index  £  can  exist;  deg(y)  <  deg{g).  Because  we  also  know  that  all  terms  of  y  up  to  deg(5:) 
are  identical  to  those  of  g,  we  may  conclude  that  y  =  g,  and  thus  that  our  lemma  is  true.  □ 

Lemma  22.  For  all  polynomials  /i,  /2,  yi,  <72  £  R[x]  such  that  /i  *  yi  =  /2  *  g2,  gcd(/i,  /2)  =  1, 
then  /2  I  yi. 


Proof.  We  defined  gcd(/i,/2)  with  /i,/2  G  i?[x]  as  the  output  of  Euclid’s  algorithm  for  calcu¬ 
lating  the  gcd  (which  succeeds  with  overwhelming  probability  by  Lemma  19).  Note  that  from 
the  intermediate  results  of  this  calculation  we  can  determine  polynomials  pi,P2  G  R[x]  such 
that  pi*  fi+p2*  f2  =  1,  as  gcd(/i,  /2)  =  1. 
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Pi*  fl+P2*  f2 
Pi  *  fl 
gi*Pi*  fl 
Pi  *  (/2  *92) 
91 


1 

I  —  P2*  f2 
9l{l  -P2*  f2) 

9l{l  -P2*  /2) 

/2  {pi  *92+P2*  gi) 


Because  there  exists  a  polynomial  pi*g2+P2*gi  £  -R[aj]  such  that  gi  =  /2  {pi  *  92  +  P2*  gi), 
f2  I  91.  □ 

Lemma  2.  Let  f,g  be  polynomials  in  R[x]  where  R  is  a  ring  sueh  that  no  PPT  adversary 
ean  find  the  size  of  its  subfields  with  non-negligible  probability,  deg(/)  =  deg((/)  =  a,  (3  >  a, 
Scd{f,g)  =  I,  and  /[deg(/)]  G  R*  A  g[deg{g)]  e  R* .  Let  r  = 
where  Vo<j</3  r[i]  <—  R,  Vo<i</3  s[i]  <—  R  (independently) . 

Let  u  =  f*r  +  g*s  =  Then  Vo<j<Q,+/3  u[i]  are  distributed  uniformly  and 

independently  over  R. 

Proof.  For  clarity,  we  give  a  brief  outline  of  the  proof  before  proceeding  to  the  details.  Given 
any  fixed  polynomials  /,  g,  u,  we  calculate  the  number  2:  of  r,  s  pairs  such  that  f*r  +  g*s  =  u. 
We  may  then  check  that,  given  any  fixed  polynomials  f,g,  the  total  number  of  possible  r,s 
pairs,  divided  by  z,  is  equal  to  the  number  of  possible  result  polynomials  u.  This  implies  that, 
if  gcd(/,  g)  =  1  and  we  choose  the  coefficients  of  r,  s  uniformly  and  independently  from  R,  the 
coefficients  of  the  result  polynomial  u  are  distributed  uniformly  and  independently  over  R. 

We  now  determine  the  value  of  z,  the  number  of  r,  s  pairs  such  that  f*r  +  g*s  =  u.  Let  us 
assume  that  for  this  particular  u  there  exists  at  least  one  pair  f,  s  such  that  f*f  +  g*s  =  u. 
For  any  pair  r' ,  s'  such  that  f  *r'  +  g  *  s'  =  u,  then 

f  *r  +  g*s  =  f  *r'  +  g*s' 
f  *  {r  —  r')  =  g  *  {s'  —  s) 

As  gcd(/,g')  =  1,  we  may  conclude  that  g\{f  —  r')  and  f\{s'  —  s)  by  Lemma  22.  Let  p*g  =  r  —  r' 
and  p  *  f  =  s'  —  s. 

We  must  now  show  that  each  polynomial  p,  of  degree  at  most  (3  —  a,  determines  exactly 
one  unique  pair  r' ,  s'  such  that  f  *  r'  +  g  *  s'  =  u,  and  that  there  exist  no  pairs  f',  s'  such  that 
f*r'  +  g*s'  =  u  that  are  not  generated  by  a  single  choice  of  the  polynomial  p  of  degree  at 
most  (3  —  a. 

To  show  that  there  exist  no  pairs  f',  s'  such  that  f*r' +  g*  s' =  u  that  are  not  generated  by 
some  choice  of  the  polynomial  p,  of  degree  at  most  (3  —  a,  we  lei  p'  *  g  =  f  —  r'  and  p*  f  =  s'  —  s 
for  any  p' ,p  of  degree  at  most  (3  — a.  As  we  proved  that  g\ (f  —  r')  and  /| (s'  —  s) ,  we  can  represent 
/  and  g  in  this  fashion  without  loss  of  generality. 

f  *  {r  —  f')  =  g  *  {s'  —  s) 

f  *{p  *9)  =  g*{p*  f) 

As  the  leading  coefficients  of  /  and  g  are  members  of  {R*  U  {0}),  we  may  apply  Lemma  21  to 
remove  both  /  and  g  from  our  equation,  leaving  the  fact  that  p  =  p' .  Thus,  there  exist  no  pairs 


42 


r' ,  s'  such  that  f  *r'  +  g  *  s'  =  u  that  are  not  generated  by  some  choice  of  the  polynomial  p,  of 
degree  at  most  (3  —  a. 

To  show  that  each  polynomial  p,  of  degree  at  most  f3  —  a,  determines  exactly  one  unique 
pair  f',  s'  such  that  f  *  r'  +  g  *  s'  =  u,  note  that  r'  =  r  —  g  *  p,  s'  =  s  +  f  *  p]  as  we  have  fixed 
/,  g,  r,  s,  a  choice  of  p  determines  both  f',  s'  .  If  these  assignments  were  not  unique,  there  would 
exist  polynomials  p,  p'  such  that  either  r'  =  r  —  g*p  =  f  —  g*p' or  s' =  s  +  f*p  =  s  +  f*p'- 
These  conditions  imply  that  either  g  *  p  =  g  *  p'  or  f  *  p  =  f  *  p'  for  some  polynomials  p  ^  p'] 
we  know  this  is  impossible  (when  the  leading  coefficients  of  /  and  g  are  members  of  R*  U  {0}) 
by  Lemma  21. 

Thus  the  number  of  polynomials  p,  of  degree  at  most  f3  —  a,  is  exactly  equivalent  to  the 
number  of  r,s  pairs  such  that  f*r  +  g*s  =  u.  As  there  are  such  polynomials  p, 

z  =  \R\^-^+\ 

We  now  show  that  the  total  number  of  r,  s  pairs,  divided  by  z,  is  equal  to  the  number  of 
result  polynomials  u.  There  are  r,s  pairs.  As  and  there 

are  possible  result  polynomials,  we  have  proved  the  theorem  true.  □ 

4.6.2  Proof  of  Lemma  4 

Lemma  23.  For  all  polynomials  q  G  R[x],  t  >  0,m  >  1,  {x  —  a)™  |  ((x  — 

Proof.  We  prove  this  lemma  by  induction. 

As  a  base  case,  we  prove  the  lemma  for  t  =  0. 

{{x-arq)^^^  =  {x-a^q 

Thus,  (x-a)™  I  ((x-a)™g)(°) 

Next,  we  make  the  inductive  assumption  for  t  =  i:  (x  —  a)™  |  ((x  —  Using  this 

assumption,  we  may  prove  the  lemma  holds  for  t  =  i  +  1. 

{{x  -  =  (^{m  +  i  +  l){x-ar+'q-{x-ar+'+\^^^Y^ 

=  ^(x  —  a)™'"*'*  ^(m  +  i  +  l)q  —  (x  —  ^ 

Thus,  by  the  inductive  assumption,  (x  — a)™'  |  ((x  —  By  the  inductive  principle, 

our  lemma  holds.  □ 

Lemma  24.  For  all  polynomials  q  G  R[x]  such  that  {x  —  a)  \  q,  t  >  0,  m  >  1,  (x  —  f 

((x  —  a)™  * 

Proof.  We  prove  this  lemma  by  induction.  Note  that  we  may  uniquely  represent  q  as  (x— a)6i+r 
such  that  r  /  0  and  deg(r)  <  1. 

As  a  base  case,  we  prove  the  lemma  for  t  =  0. 

{{x-arq)^^^  =  {x-a^q 

=  (x  —  a)”^((x  —  a)6i  +  r) 

=  (x  -  a)”^+^6i  +  (x  -  a)”^r 
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Because  {x  —  a)^r  /  0,  deg(r)  <  1,  and  (x  — a)™'+^  f  (x  — a)™'r,  we  know  that  deg((x  —  a)^r)  < 
deg  ((x  —  0)"^+^),  and  (x  —  f  ((x  — 

Next,  we  make  the  inductive  assumption  for  t  =  k:  (x  —  \  ((x  —  a)™'  >1=  ■  Using 

this  assumption,  we  may  prove  that  the  lemma  holds  for  t  =  A:  +  1. 

/  \  (^) 

((x-a)”^*g)(^+^)  =  (^m(x  -  +  (x  -  j 

=  ^(x  —  +  (x  —  ^ 

Let  =  m  —  1  and  (7'  =  mq  +  (x  —  a)q^^\  We  know  through  the  inductive  assumption  that: 

(x  —  f  ^(x  —  a)™'“^  +  (x  —  ^ 

t  ((x-ar*g)(^+') 

Thus,  by  the  inductive  principle,  our  lemma  holds.  □ 

Lemma  25.  For  all  polynomials  q  G  i?[x]  such  that  (x  —  a)  f  q,  (x  —  a)  f  ((x  — 

Proof.  We  prove  this  lemma  by  induction.  Note  that  we  may  uniquely  represent  q  as  (x— a)6i+r 
such  that  r  /  0  and  deg(r)  <  1. 

As  a  base  case,  we  prove  the  lemma  for  t  =  0. 

((x-a)°g)^°^  =  q 

Because  (x  —  a)  f  q,  (x  —  a)  f  ((x  —  a)^q)^^\ 

Next,  we  make  the  inductive  assumption  for  t  =  i:  (x  —  a)  f  {f^x  —  af q')^''\  Using  this 
assumption,  we  may  prove  that  the  lemma  holds  for  t  =  i  +  1. 

((x  —  +  l)(x  —  a)*g  +  (x  — 

=  ((i  +  l)(x  -  afq)  +  {^{x  - 

By  Lemma  23,  (x  —  a)  |  ((x  —  Thus,  for  some  unique  polynomial  62  £ 

((x  —  =  (x  —  0)62.  By  the  inductive  assumption,  {x  —  a)  \  ((i  +  l)(x  — 

Thus,  for  some  unique  polynomials  63,  rs  G  R[x\  (such  that  ra  /  0,deg(r3)  <  1), 

((i  +  l)(x  -  =  (x  -  0)63  +  ra. 

((x  -  =  ((x  -  0)63  Tra)  +  ((x  -  0)62) 

=  (x  -  a)(63  +  62)  + ’’s 

As  ra  /  0,deg(r3)  <  1,  (x  -  a)  f  ((x  -  a)*+^  ^^  (*+!)_  By  the  inductive  principle,  our  lemma 
holds.  □ 
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Lemma  4. 


Let  Fj  G  ii[x]  (0  <  j  <  d)  each  of  degree  j  such  that  gcd(Fo, . . .  ,F(i)  =  1.  For  all  elements 
a  G  R  such  that  yo<j<d  {x  —  a)  \  Fj,  q  G  R[X]  such  that  {x  —  a)  \  q,  and  rj  <—  [x] 

(0  <  j  <  d),  and: 

•  if  m  >  d,  f  =  {x  —  a)™'  *  q  ^  {x  —  \  J2j=o  ^  f 

Ei=o  *  Pj  *  ^3 

•  if  m  <  d,  f  =  {x  -  a)'^  *  q  ^  {x  -  a)  \  J2j=o 
with  overwhelming  probability. 

Proof.  •  If  m  <  d,  by  Lemma  25,  there  exists  with  overwhelming  probability  at  least 
one  index  j  (0  <  j  <  d)  such  that  (x  —  a)  f  ((x  —  a)™  *  q)^^^  Fj  *  rj^.  Let  A  =  {j  \ 
(x  —  a)  f  ((x  —  a)™  *  Fj  *  rj}  and  B  =  {j\0<j<dAj^  A}.  Each  polynomial 
((x  —  a)™'  *  q)^^^  Fj  *  rj  can  be  represented  as  (x  —  a)qj  +  Sj.  By  the  definition  of  A  and 
B,  y j^A  Sj  /  0  and  ^j^B  Sj  =  0. 

d 

((x  —  a)™'  *  *  Fj  *  rj 

1=0 


Note  that  Y^j^A^j  ^  with  overwhelming  probability.  Thus,  as  deg  [Yj^A-^jj  < 
may  conclude  that  (x  —  a)  f  Yj=o  ((^  ~  *  Bj  *  fj  with  overwhelming  probability. 

•  If  m  >  d,  by  Lemma  23,  (x  — |  for  0  <  j  <  d.  Thus,  by  the  distributive  property 
over  rings,  (x— |  Yj=o  f^^^*Fj*rj.  Note  also  that  by  Lemma  24,  (x— f 
By  the  analysis  above,  with  overwhelming  probability,  /  =  (x  —  a)™  *q  ^  (x  —  0)”^“'^"’“^  f 

□ 


((x  —  a)^  *  q)^^^  *  Fj  * 

=  ^iix-a)qj  +  Sj)  +  ^{{x-a)qj) 
jeA  j&B 

d 

=  ix-a)'^qj  +  '^Sj 
1=0  j&A 


^Note  that  {x  —  a)  \  rj  with  overwhelming  probability,  as  rj  is  random  and  of  polynomial  size. 
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Chapter  5 


Hot  Item  Identification  and 
Publication 


Ideally,  all  privacy-preserving  distributed  information  sharing  protocols  would  use  extremely 
strong  definitions  of  security.  In  Chapter  4,  we  constructed  protocols  for  many  important 
applications  that  are  secure  under  standard  notions  of  cryptographic  security.  Unfortunately, 
these  protocols  are  not  sufficiently  efficient  or  robust  for  certain  demanding  applications;  we 
believe  that  no  protocols  so  stringently  secure  can  be  sufficiently  efficient.  Thus,  in  this  chapter, 
we  examine  variants  of  the  Over-Threshold  Set-Union  problem  from  Chapter  4.  We  construct 
protocols  for  these  variants,  the  hot  item  identification  and  publication  problems,  that  are  both 
extremely  efficient  and  robust.  In  order  to  achieve  these  levels  of  efficiency,  we  compromise 
privacy.  Instead  of  being  cryptographically  secure,  our  protocols  achieve  the  novel  notions  of 
owner  and  data  privacy. 

We  begin  in  Section  5.1  by  describing  the  hot  item  identification  and  publication  problems, 
as  well  as  defining  our  security  properties  of  owner  and  data  privacy.  We  then  construct  efficient 
and  robust  protocols  for  the  hot  item  identification  and  publication  problems  in  Sections  5.2 
and  5.3,  respectively.  In  Section  5.4  we  analyze  the  security  of  these  protocols,  before  describing 
several  extensions  in  Section  5.5.  To  show  the  utility  and  efficiency  of  our  hot  item  identification 
protocol,  we  perform  experiments  in  Section  5.6  in  which  we  perform  distributed  detection  of 
simulated  worms  in  real  network  traffic. 


5.1  Problem  Definition  and  Desired  Properties 

In  this  section,  we  give  a  formal  definition  of  the  hot  item  identification  and  publication  problems 
and  our  novel  security  and  privacy  properties. 

Problem  Definition.  In  our  problem  setting,  each  player  i  (1  <  f  <  n)  in  the  system  holds 
a  private  dataset  Si  of  m  elements  from  a  domain  denoted  M^.  Let  a  k-threshold  hot  item 
(referred  to  in  this  chapter  as  a  hot  item)  be  any  element  a  that  appears  in  at  least  k  distinct 
players’  private  input  datasets.  is  the  set  of  all  /c-threshold  hot  items  in  the  system.  All 
items  not  in  are  called  eold  items.  We  define  the  hot  item  identification  problem  as  follows: 
each  player  f  (1  <  i  <  n)  learns  Si  n  Hk,  denoted  Pi. 

^Note  that  here  for  simplicity,  we  assume  each  player  has  the  same  number  of  items  in  its  private  set;  our 
protocols  can  be  easily  extended  to  scenarios  where  this  is  not  the  case. 
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Adversary  Model.  In  this  chapter,  we  consider  a  strong  adversary  model.  First,  we  assume 
that  the  adversary  can  eavesdrop  on  all  communication.  Second,  we  assume  that  a  fraction 
A  of  the  players  may  maliciously  and  arbitrarily  misbehave,  while  the  rest  of  the  players  are 
honest-but-curious . 

A  malicious  player  may  not  follow  the  protocol;  instead  he  may  behave  arbitrarily.  For 
example,  malicious  players  could  collude  such  that  if  An  exceeds  k,  they  can  make  an  arbitrary 
cold  item  hot  by  each  reporting  it  as  in  their  private  input  set^.  However,  this  issue  is  out  of 
the  scope  of  our  problem.  Because  any  player  has  the  freedom  to  choose  his  private  input  set, 
any  protocol  in  this  setting  is  vulnerable  to  this  manipulation. 

Another  attack  a  malicious  player  could  mount  is  to  claim  to  have  many  copies  of  one  item 
to  try  to  boost  the  frequency  of  a  cold  item  to  be  high  enough  to  become  a  hot  item;  we  call  this 
the  inflation  attack.  Thus,  our  protocols  must  ensure  that  during  hot  item  identification,  each 
player  can  only  contribute  to  the  frequency  count  of  an  item  at  most  once.  Note  that  this  is  a 
challenging  task,  as  we  will  need  to  preserve  the  players’  privacy  as  well  as  security.  We  have 
designed  a  novel  cryptographic  method,  one-show  tags  (Section  5.2.2),  to  prevent  the  inflation 
attack  and  preserve  the  players’  privacy.  This  novel  construction  could  be  of  independent 
interest. 

Moreover,  malicious  players  could  attempt  to  forge  cryptographic  signatures,  send  bogus 
messages,  try  to  learn  honest  players’  private  data,  or  fool  other  players.  Note  that  our  protocol 
defends  against  all  of  these  attacks,  by  provably  achieving  the  properties  of  owner  and  data 
privacy  as  defined  below. 

Definition  of  Correctness.  Given  the  false  positive  rate  (5+  and  the  false  negative  rate  5- 

•  Va  G  n  Hk,  player  i  learns  that  a  G  Pi  with  probability  at  least  1  —  (5_, 

•  Va  G  n  Hki  player  i  learns  that  a  ^  Pi  with  probability  at  least  1  —  (5+. 

To  allow  more  flexible  and  efficient  protocols,  while  preserving  a  reasonable  level  of  privacy 
for  participants,  we  define  two  new  concepts  of  security:  owner  privacy  and  data  privacy. 

Definition  of  Owner  Privacy.  Intuitively,  a  protocol  that  is  owner  private  prevents  an 
adversary  from  determining  that  any  element  a  is  part  of  a  particular  honest  player’s  private 
input  set.  Even  an  element  that  is  known  to  be  hot  cannot  be  shown  to  be  held  by  any  player; 
the  elements  of  a  player’s  private  input  set  are  anonymous. 

Formally,  we  say  a  k-hot  item  identification  protocol  satisfies  owner  privacy  if:  no  coalition  of 
at  most  An  malicious  PPT  adversaries  can  gain  more  than  a  negligible  advantage  in  associating 
an  item  a  with  a  honest  player  f  (1  <  i  <  n)  such  that  a  G  Si,  over  a  situation  in  which  all 
players  simply  are  given  the  set  of  hot  items  P  and  their  frequencies. 

Definition  of  Data  Privacy.  Data  privacy  concerns  protection  of  the  union  of  the  players’ 
inputs,  especially  cold  items.  Ideally,  a  truly  privacy-preserving  hot-item  identification  protocol 
should  not  reveal  any  information  about  the  cold  items;  we  give  such  a  protocol  in  Chapter  4. 
However,  such  a  strong  privacy  definition  entails  inefficient  solutions,  as  the  cryptographic 
techniques  add  too  much  overhead  in  computation  and  communication  for  many  situations. 
Thus,  in  this  chapter,  we  study  the  tradeoff  between  privacy  and  efficiency;  in  particular,  we 
rigorously  define  a  relaxed  notion  of  privacy,  show  that  it  provides  sufficient  protection  for  many 
applications,  and  design  efficient  solutions  to  achieve  it. 

^In  fact,  even  when  An  is  less  than  k,  they  could  make  a  cold  item  that  appears  at  least  k  —  An  times  in 
non-malicious  players  to  appear  as  a  hot  item. 
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Protocol  Si  ...  Sn 


End  S^nH,  ■  ■  ■  S„nH, 


Purpose 

Data  Privacy  & 
Domain  Reduction 

Reiiabiiity  against 
Infiation  attacks 

Owner  Privacy 

Bandwidth  Saving 
&  Reiiabiiity 


Figure  5.1:  Components  of  HotItem-ID  protocol:  HotItem-ID  defines  how  to  efficiently  compute 
an  approximate  representation  of  Hk  in  distributed  fashion.  Each  player  i  (1  <  i  <  n)  constructs  local 
filters  to  approximately  represent  his  private  input  set  Si,  generates  one-show  tags  for  marked  bits  in 
filters,  and  sends  a  subset  of  those  one-show  tags  to  the  network  using  anonymous  routing.  A  distributed 
approximate  distinct  element  counting  protocol  aggregates  those  tags  in  the  network.  At  the  end  of  the 
protocol,  all  players  learn  the  global  filters  that  approximate  Hk-  At  the  right  side  of  the  figure  we  list 
the  purpose  of  each  component. 


In  our  definition  of  data  privacy,  the  degree  of  the  privacy  of  an  item  describes  the  size  of  the 
‘crowd’  in  which  the  element  is  hidden;  no  adversary  can  determine  which  element  of  the  large 
crowd  was  included  in  the  players’  private  input  sets.  Formally,  we  say  an  element  a  which 
appears  in  fa  players’  private  input  sets  has  ‘l>(/a)-degree  of  data  privacy  if:  no  coalition  of 
at  most  A  malicious  probabilistic  polynomial-time  (PPT)  adversaries  [28]  can  distinguish  the 
element  a  from  an  indistinguishable  set  with  expected  size  Thus,  for  a  cold  item  a,  the 

larger  <h(/a)  is,  the  better  protected  it  is  in  general. 

Efficiency.  We  also  want  the  protocol  to  be  highly  efficient.  In  particular,  to  identify  hot 
items  that  appear  in  at  least  a  certain  fraction  of  the  total  number  of  players’  private  input 
sets,  we  would  like  the  protocol  to  have  constant  per-player  communication  overhead.  We  will 
show  that  our  protocol  achieves  this  property  by  using  a  combination  of  various  approximate 
counting  methods. 


5.2  HotItem-ID  Protocol 

In  Figure  5.1,  we  show  an  overview  of  the  components  of  our  efficient  privacy-preserving  hot  item 
identification  protocol  HotItem-ID  and  their  purposes.  We  first  introduce  the  intuition  behind 
each  component.  Then,  in  Section  5.2.6,  we  describe  the  full  construction  of  our  HotItem- 
ID  protocol.  Once  all  players  learn  the  hot  items  in  their  private  datasets,  they  can  run  our 
HotItem-Pub  protocol  (Section  5.3)  to  securely  publish  the  identified  hot  items. 

5.2.1  Approximate  Heavy-Hitter  Detection 

A  naive  approach  to  hot  item  identification  is  to  count  the  number  of  players  holding  each 
possible  element,  then  determine  whether  that  element  is  hot.  However,  performing  this  task 
is  extremely  inefficient  for  large  domains;  many  applications  require  use  of  strings  of  1024 
bits  or  more.  In  most  scenarios  with  a  large  number  of  players,  it  is  prohibitively  expensive 
to  count  each  of  these  2^^^^  (or  more)  distinct  elements  separately.  Even  if  the  players  only 
count  the  frequency  of  those  elements  that  appear  in  their  private  input  sets,  the  bandwidth 
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Player  1  's  local  filter 


Player  2's  local  filter 


Player  3's  local  filter 


Global  Filters 


Figure  5.2:  In  our  HotItem-ID  protocol,  each  player  i  constructs  a  set  of  local  filters  from  his  private 
input  set  Si  (dark  bits  are  ‘hit’).  The  players  then  construct  global  filters  using  an  approximate  counting 
scheme;  if  a  bit  was  hit  by  at  least  k  players,  then  it  is  ‘hit’  (dark)  as  well.  If  an  element  hashes  to  a 
dark  bit  in  each  of  the  global  filters,  then  it  is  classified  as  hot. 


required  will  often  be  prohibitive.  In  order  to  avoid  this  inefficient  naive  approach,  we  utilize 
an  approximate  heavy-hitter  identifieation  seheme  [18].  This  approximation  scheme  allows  us 
to  efficiently  combine  the  process  of  counting  the  elements  held  by  all  players. 

In  this  approximate  heavy- hitter  identification  scheme,  each  player  constructs  a  local  filter. 
The  players  then  combine  their  local  filters  to  construct  a  global  filter;  this  global  filter  approx¬ 
imately  identifies  hot  items.  We  illustrate  this  process  of  local  and  global  filter  construction  in 
Figure  5.2. 

First,  each  player  constructs  a  set  of  T  loeal  filters,  which  approximately  represent  his  private 
input  set.  Let  hi, . . . ,  hx  ■  {0, 1}“^  ^  {1, . . . ,  6}  be  polynomial- wise  independent  hash  functions 
with  uniformly  distributed  output,  such  as  cryptographic  hash  functions  [40].  Each  filter  q 
(1  <  9  <  T)  for  player  i  (1  <  i  <  n)  is  an  array  of  b  bits,  represented  by  the  bit  array 
^  local  filter  bit  set  to  1  indicates  that  at  least  one  element  in  the  player’s 
private  input  set  hashes  to  that  bit;  we  refer  to  such  bits  as  hit.  Formally,  player  i  {1  <i  <n) 
computes  each  bit  Wi^qj  :=  1  3a£Sihq{a)  =  j.  The  players  then  combine  their  local  filters 
into  a  set  of  global  filters,  using  methods  described  below;  global  filters  approximately  represent 
the  players’  combined  private  input  sets.  We  will  represent  each  global  filter  as  the  bit  array 
(1  <  9  <  T)-  If  at  least  k  players  marked  bits  j  of  filter  q  as  hit,  then  let  the  bit 
in  the  global  filter  Xqj  :=1  (1  <  q  <T,  1  <  j  <  b).  Otherwise,  let  Xqj  :=  0.  Given  this  global 
filter,  a  is  hot  with  high  probability  if  =  1, . . .  ,Xx^hT{a)  —  1-  For  statistical  analysis  of 

this  approximation  scheme,  see  Section  5.4.1. 


5.2.2  One-Show  Tags 

In  order  to  construct  the  global  filters,  we  must  count  how  many  players  hit  each  bit  in  their 
local  filters.  Malicious  players  may  attempt  to  affect  this  count  by  ‘voting’  multiple  times  for 
a  bit;  this  inflation  attaek  could  lead  to  elements  being  erroneously  marked  as  hot.  If  players 
later  publish  their  hot  items,  the  attacker  would  learn  private  information  through  this  attack. 
Most  techniques  that  ensure  players  ‘vote’  at  most  once  would  reveal  an  unacceptable  level  of 
information  about  each  players’  private  filters.  The  voting  process  must  therefore  be  anonymous 
to  prevent  the  adversaries  from  learning  information  about  any  particular  players’  private  input. 
We  ensure  that  each  player  can  ‘vote’  only  once,  without  compromising  their  privacy,  with 
anonymous  one-show  tags.  If  a  player  set  a  bit,  he  constructs  a  tag  for  that  bit;  the  players 
then  count  the  number  of  valid  tags  for  each  bit  to  construct  the  global  filters.  We  require  that 
one-show  tags  posses  the  following  properties:  (a)  no  PPT  adversary  can  construct  more  than 
one  valid  tag  for  any  bit  with  non-negligible  probability;  (b)  any  PPT  player  can  detect  tags  that 
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are  invalid,  or  not  associated  with  a  particular  bit,  with  overwhelming  probabilty;  (c)  for  every 
bit,  the  tags  constructed  by  any  two  players  are  distinct,  with  overwhelming  probability;  (d)  no 
PPT  adversary  can  distinguish  the  player  that  constructed  it,  with  probability  non-negligibly 
different  than  where  n  is  the  number  of  honest  players. 

Let  all  players  share  a  group  public  key  for  tag  verification.  Each  player  holds  an  individual 
private  key,  used  to  construct  tags.  The  message  and  one-show  parameters  vary  by  bit,  to 
ensure  that  tags  constructed  for  one  bit  are  not  confused  with  those  for  another  bit.  We  denote 
the  algorithm  for  construction  of  a  one-show  tag  0ST(group  public  key,  private  signing  key, 
message,  one-show  parameter).  We  provide  a  construction  of  such  tags  in  Appendix  B.3.  Note 
that  this  cryptographic  tool  is  communication-efficient;  our  construction  requires  only  1,368 
bits. 


5.2.3  Approximate  Distinct  Element  Counting 

We  may  now  securely  identify  hot  items  by  utilizing  global  filters  and  anonymous  one-show 
tags.  However,  exactly  counting  the  number  of  valid,  distinct  one-show  tags  is  inefficient;  the 
players  would  have  to  collect  every  single  tag.  To  ensure  that  no  tag  is  counted  twice,  the 
players  would  also  have  to  store  information  about  every  tag. 

We  can  perform  the  task  of  approximate  counting  of  distinct  tags  through  an  efficient  al¬ 
gorithm  for  approximate  distinct  element  counting  [4].  In  our  protocol,  however,  we  gain  even 
more  efficiency  by  estimating  directly,  through  modified  use  of  this  approximation,  whether 
there  are  more  or  fewer  than  k  tags  for  a  bit;  this  is  the  task  that  must  be  performed  to 
construct  the  global  filters. 

To  approximate  the  number  of  distinct  tags,  we  need  t  specific  tags  from  the  set.  Let 
TL  :  {0,1}*  ^  {0,1}“^  be  a  collision-resistant  cryptographic  hash  function  [40].  Let  vi,...,vt 
represent  valid  one-show  values  that,  when  hashed  with  7i,  have  the  smallest  values  of  any  tags 
in  the  set.  We  estimate  the  total  number  of  distinct  tags  as  [4].  To  increase  the  accuracy  of 
this  approximation,  one  may  choose  a  independent  hash  functions  and  perform  this  estimation 
with  each  such  function;  the  resulting  estimate  is  the  median  of  the  approximation  obtained 
with  each  hash  function.  We  have  found  a  :=  l,t  :=  25  sufficient  in  practice. 

To  perform  the  task  of  determining  whether  there  are  at  least  k  distinct  tags  corresponding 
to  a  bit,  the  players  can  perform  a  full  approximation.  However,  we  have  designed  a  more 
efficient  variant  for  this  particular  problem.  Note  that,  if  there  exist  t  tags  such  that  the  values 
of  their  hashes  is  at  most  the  approximation  scheme  we  describe  above  will  conclude  that 
there  are  at  least  k  total  distinct  tags.  The  players  thus  can  instead  perform  this  more  efficient 
collection  task,  often  without  even  examining  ever  tag.  For  example,  any  tag  with  value  greater 
than  ^  may  be  immediately  discarded  without  affecting  the  approximation. 

In  many  situations,  there  is  a  large  gap  between  the  frequency  of  hot  items  and  cold  items. 
For  example,  worm  attacks  will  be  detected  a  large  number  of  times  by  network  monitors; 
normal  traffic  is  far  more  varied.  Thus,  to  gain  greater  efficiency  in  detecting  bits  with  at  least 
k  one-show  tags,  we  may  adjust  the  t-collection  protocol  to  follow  the  algorithm  of  [54]:  each 
player  attempts  to  collect  t  tags  with  hash  values  of  at  most  (1  -|-  7)^.  (7  is  a  constant  based 
on  the  size  of  the  ‘gap’  between  the  frequency  of  hot  and  cold  items.)  We  gain  efficiency  in  such 
situations  by  reducing  t,  while  retaining  accuracy.  In  practice,  we  have  found  7  :=  .2,t  :=  5 
sufficient.  This  scheme  is  identical  to  the  above  scheme  if  7  :=  0. 
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Protocol:  t-Collection 

Input:  All  n  players  know  7i,t,  fc,7,  and  the  diameter  of  the  network,  D.  Each  player  *(!<*<«.)  holds 
a  set  U'ifi  =  Uifi  of  one-show  tags  for  some  specific  bit. 

Output:  Each  player  outputs  success  if  they  collected  t  valid  one-show  tags  with  one  show  value  v  such 
that  Ti.{v)  <  (1  -I-  7)^77;  otherwise,  they  output  failure. 

1.  For  I  =  1, . . . ,  D ,  each  player  i  (1  <  i  <  n): 

(a)  Sends  the  set  of  new  small  tags  n  Ui  i_i  to  all  of  his  neighbors. 

(b)  If  \Ui^t-i\  =  t  then  player  i  stops  participating  in  the  protocol  and  goes  to  Step  2. 

(c)  Receives  a  set  of  tags  as  a  result  of  Step  la  (a)  -  let  the  set  Ui^i  be  those  which  are  valid  tags 
for  the  correct  bit,  with  one  show  value  v  such  that  77 (u)  <  (1  -I-  7)^77- 

(d)  Calculates  17'^  =  U[  i_i  U  t/i^. 

2.  Each  player  *  (1  <  f  <  n)  outputs  success  if  \Ui^e\  =  t,  and  otherwise  outputs  failure. 

Figure  5.3:  t-collection  protocol 

5.2.4  Anonymous  Communication 

We  must  now  apply  our  tools  to  a  network  structure  to  complete  the  protocol.  In  order  to 
disassociate  the  anonymous  one-show  tags  each  player  constructs  from  their  location  in  the 
network,  each  player  anonymously  routes  his  tags  to  p  random  players.  The  constant  p  can  be 
varied  to  achieve  greater  or  lesser  degrees  of  robustness;  at  least  one  participating  player  should 
receive  each  player’s  tags  with  high  probability.  We  require  an  anonymous  routing  scheme 
allows  any  player  to  send  a  message  to  a  uniformly  randomly  selected  player,  without  revealing 
the  source  to  any  intermediate  routing  node.  Simple  and  lightweight  schemes  can  be  used,  as  we 
require  only  that  each  player  send  anonymous  messages  to  a  uniformly  selected  destination  node, 
without  revealing  who  sent  the  message.  Some  previously  proposed  anonymous  networking 
schemes  include  [11,  13,  42,  48,  49]. 

5.2.5  Distributed  One-Show  Tag  Collection 

The  players,  once  they  have  anonymously  received  their  initial  set  of  tags,  then  count  the 
number  of  tags  associated  with  each  bit  of  the  filters.  There  are  many  different  ways  in  which 
they  can  accomplish  this  task,  such  as  through  the  use  of  gossip  networks  or  centralized  servers; 
the  ideal  method  depends  on  the  particular  network  in  which  it  is  employed.  We  suggest  two 
distributed,  secure,  and  efficient  protocols  that  require  only  minimal  network  capabilities:  each 
player  must  be  able  to  send  messages  to  each  of  its  neighbors,  who  form  a  connected  graph. 
As  we  require  only  minimal  network  capabilities,  even  more  efficient  schemes  than  appear  in 
previous  work  may  be  employed  [47,  50,  53].  For  clarity  of  presentation,  our  protocols  assume 
synchronous  communication,  but  can  be  adapted  to  an  asynchronous  model  by  adapting  the 
termination  conditions. 

Our  two  protocols  for  distributed  one-show  tag  collection  function  as  follows:  (a)  collecting 
t  one-show  tags  (with  sufficiently  small  hash  value)  to  approximate  whether  there  are  at  least 
k  valid,  distinct  tags  for  a  bit  or,  (b)  finding  the  t  one-show  tags  with  the  smallest  hash  values 
to  approximate  how  many  valid,  distinct  tags  there  are  for  a  bit.  Note  that  these  protocols  can 
be  executed  in  parallel  for  each  bit  of  the  global  filters. 

We  give  an  efficient,  robust  protocol  for  (a),  the  t-collection  task  in  Figure  5.3.  Each  player 
maintains  a  set  of  at  most  t  valid  tags  which  have  hash  values  at  most  (1 -I-7)  Upon  learning 
a  new  tag  that  will  be  added  to  his  set,  a  player  sends  the  new  tag  to  all  of  his  neighbors.  When 
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Protocol:  t-Minimum  Aggregation 

Input:  All  n  players  know  T~L,t,  and  the  diameter  of  the  network  D.  Each  player  *  (1  <  *  <  n)  holds  a  set 

17j',q  =  Uifi  of  one-show  tags  for  some  specific  bit. 

Output:  Each  player  learns  the  set  U' ,  those  t  tags  with  the  smallest  hashes  of  their  one-show  values. 

1.  For  I  =  1, . . . ,  D,  each  player  i  (1  <  i  <  n): 

(a)  Sends  the  set  of  new  small  tags  n  to  all  of  his  neighbors. 

(b)  Receives  a  set  of  tags  as  a  result  of  Step  la  -  let  those  which  are  valid  one-show  tags  for  the 
correct  bit  form  the  set  Ui^i. 

(c)  Calculates  17' ^  to  be  those  t  tags  with  one-show  values  vi, . . .  ,vt  such  that  Ti.{vi)  <  •  •  •  <  Ti.{vt) 
and 

2.  Each  player  i  {1  <  i  <  n)  outputs  the  set  U'  =  17'  £>. 

Figure  5.4:  l-minimum  value  aggregation  protocol 

a  player  has  collected  t  tags  with  hash  values  of  at  most  (1  -|-  7)^,  and  sent  each  of  these 
tags  to  his  neighbors,  he  ends  his  participation  in  the  f-collection  protocol.  If  there  do  not 
exist  t  such  small-hash-value  tags,  the  protocol  must  continue  until  it  converges.  The  players 
must  ensure  that  information  travels  from  one  side  of  the  network  to  the  other;  this  requires  D 
rounds,  where  D  is  the  diameter  of  the  network^. 

We  give  an  efficient,  robust  protocol  for  (b),  the  t-minimum  task  in  Figure  5.4.  Like  the 
t-collection  protocol,  each  player  passes  on  small-valued  valid  tags  to  his  neighbors,  retaining 
the  t  tags  with  the  smallest  hash  values.  This  process  continues  until  it  converges  (at  D  rounds) 
and  all  players  have  learned  the  t  tags  with  the  smallest  hash  values.  This  protocol  is  based  on 
that  of  [46]. 

5.2.6  Putting  HotItem-ID  to  Work 

We  now  outline  our  HotItem-ID  protocol  in  Figure  5.5.  Let  ski  be  player  i’s  (1  <  i  <  n) 
private  key,  allowing  him  to  construct  one-show  tags.  These  keys  can  be  distributed  by  a  trusted 
‘group  manager’,  by  a  mutually  distrustful  group  of  players  acting  as  such,  or  by  all  players 
acting  in  concert  (see  Appendix  B.3). 

In  Step  1,  each  player  i  (1  <  i  <  n)  constructs  T  local  heavy-hitter  identification  filters 
(1  <  <?  <  T)  from  their  private  input  sets.  They  then  construct  a  one-show  tag 
for  each  bit  in  the  filters  marked  as  hit  {wi^qj  =  1);  for  a  specific  construction  of  the  one-show 
value,  see  Appendix  B.3.  In  Step  2,  player  i  anonymously  sends  the  tags  for  counting  to  p 
randomly  chosen  players.  (If  the  players  will  be  utilizing  t-collection  for  counting,  they  may 
save  bandwidth  by  only  sending  those  tags  where  the  hash  (?t)  of  the  one-show  value  is  at  most 

(1  +  7)T-) 

To  construct  the  global  filters  (1  <  ^7  <  T),  the  players  must  determine,  for 

each  bit  in  the  global  filters,  whether  at  least  k  players  hit  that  bit  in  their  local  filters.  In  Step 
3,  to  efficiently  and  securely  perform  this  task,  the  players  utilize  either  the  t-collection  protocol 
(Figure  5.3)  or  t-minimum  aggregation  protocol  (Figure  5.4).  Using  one  of  these  protocols,  the 
players  approximate  whether  there  were  at  least  k  valid,  distinct  tags  constructed  for  each  bit 
(for  greater  accuracy,  the  players  perform  this  approximation  a  times,  taking  the  median  of 

^Note  that  this  achieves  a  strict  lower  bound  on  the  speed  of  information  transmission,  based  on  the  definition 
of  network  diameter. 
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Protocol:  HotItem-ID 

Input:  There  are  n  players,  A  maliciously  colluding,  each  with  a  private  input  set  Si.  Each  player  i  £  {1, . . . ,  n} 
holds  a  secret  key  ski  allowing  him  to  construct  one-show  tags,  as  well  as  a  common  public  key  pk  for  tag 
verification.  These  keys  can  be  chosen  by  a  trusted  administrator  or  by  a  distrustful  collective  [3] .  hi, . . .  ,hT 
are  independently  chosen  cryptographic  hash  functions  with  range  {1, . . . ,  is  a  cryptographic  hash  function 

with  range  {0, 1}".  p,  b,  T,  t,  a,  7  are  parameters  known  to  all  participants. 

Output:  Each  player  i  (1  <  i  <  n)  obtains  the  set  Pi  C  Si,  such  that  each  element  a  £  Pi  is  a.  hot  item.  All 
players  hold  the  approximate  heavy-hitter  filters  (1  <  9  <  T). 

1.  Each  player  i  {1  <  i  <  n)  constructs  T  local  approximate  heavy-hitter  identification  filters 

(1  <  g  <  T)  from  their  private  input  set  Sp.  3aeSihq{a)  =  j  ^  Wi,qj  =  1.  Wi,qj  =  0,  otherwise. 

2.  Each  player  i  (1  <  i  <  n),  for  each  bit  j  of  filter  q  {1  <  j  <  b,  1  <  q  <  T)  s.t.  Wi,q,j  =  1: 

(a)  constructs  a  one-show  tag  0ST{pk,  ski,q  ||  j,  one-show- value) 

(b)  player  i  anonymously  routes  the  tag  to  p  randomly  chosen  players  (if  the  players  will  be  utilizing 
t-collection  in  Step  3,  do  not  send  the  tag  unless  Ti.  of  the  one-show  value  is  <  (1  -I-  7)^77) 

3.  For  each  bit  j  of  filter  q  {1  <  j  <  b,  1  <  q  <  T)  ,  all  players  1, . . . ,  n,  perform  exactly  one  of  the  following 
tasks: 

•  perform  t-collection,  with  a  independent  hash  functions,  on  the  tags  received  in  Step  2,  attempting  to 
collect,  for  each  hash  function  Ti.,  t  valid  one-show  tags  such  that  each  one-show  value  v  <  (1+7)^^. 
If  t-collection  was  successfully  performed  for  the  majority  of  hash  functions,  set  Xq,j  =  1.  Otherwise, 

~  0. 

•  perform  t-minimum  value  aggregation,  with  a  independent  hash  functions,  on  the  tags  received  in 
Step  2.  If  V  is  the  tth  minimum  hash  value,  approximate  that  there  are  total  tags;  the  median  of 
each  such  approximation  forms  the  final  approximation.  If  this  process  determines  that  there  were 
at  least  k  tags,  set  Xqj  =  1.  Otherwise,  Xqj  =  0. 

4.  Each  player  calculates  his  output  set  Pp.  for  each  element  a  €  Si,  if  =  1,  then  a  £  Pi. 

Figure  5.5:  HotItem-ID  protocol,  for  identifying  the  hot  items  in  each  players’  private  input. 


the  approximations).  If  the  players  conclude  that  there  were  at  least  k  valid,  distinct  tags 
constructed  for  bit  j  of  filter  q  {I  <  j  <  b,  1  <  q  <  T),  then  they  set  Xgj  :=  1;  else,  set 

-=0. 

Once  they  have  constructed  the  global  filters  {xq,j}j^{i,...,b}  (1  <  <  T’),  each  player  can 

utilize  these  filters,  as  shown  in  Step  4,  to  determine  whether  each  element  of  his  input  is  a  hot 
item.  If,  for  an  element  a,  =  1;  then  a  is  a  hot  item  with  high  probability. 

5.3  Hot  Item  Publication  Protocol 

In  this  section,  we  present  our  HotItem-Pub  protocol.  This  protocol  utilizes  the  HotItem-ID 
protocol,  so  that  each  player  may  identify  his  hot  items,  but  also  ensures  that  the  players  may 
securely  and  efficiently  publish  their  common  hot  items. 


5.3.1  Commitment  to  Foil  Attacks 

When  players  publish  hot  items,  we  must  prevent  two  attacks  by  malicious  players:  (1)  sup¬ 
pressing  hot  items,  so  players  do  not  learn  them;  (2)  fooling  honest  players  into  accepting  cold 
items  as  hot.  As  we  have  shown,  our  HotItem-ID  protocol  (see  Section  5.2)  effectively  allows 
all  players  to  identify  the  hot  items  in  their  private  input  sets,  even  in  the  presence  of  malicious 
players.  By  using  the  HotItem-ID  protocol  to  identify  hot  items,  we  must  only  address  attacks 
on  the  publication  process  itself. 
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Common  Protocol  Outline.  Each  of  the  variants  of  our  HotItem-Pub  protocol  prevents 
both  item  suppression  and  fooling  attacks;  however,  they  achieve  different  levels  of  owner 
privacy.  For  clarity,  we  will  first  describe  the  basic  outline  for  the  protocol,  which  is  common 
between  all  variants.  In  the  HotItem-Pub  protocol,  the  players  follow  four  steps: 

1.  Commitment.  Each  player  constructs  a  computationally-binding  and  computationally- 
hiding  commitment  to  their  private  input  set  [40];  the  exact  form  of  the  commitment 
varies  according  to  the  level  of  owner-privacy  desired.  This  is  later  used  to  prove  (by 
opening  the  commitment)  that  a  published  hot  item  was  held  by  some  player  before  the 
global  filters  were  constructed,  without  revealing  any  information  at  this  stage  of  the 
protocol.  If  owner-privacy  is  desired,  the  players  then  send  each  of  these  commitments  to 
p  random  players.  This  commitment  is  then  distributed  to  all  players. 

2.  HotItem-ID.  The  players  then  execute  the  HotItem-ID  protocol,  described  in  Sec¬ 

tion  5.2.  Each  player  learns,  with  high  probability,  which  of  the  elements  of  his  private 
input  set  are  hot,  even  in  the  presence  of  malicious  players.  In  addition,  each  player  also 
obtains  the  global  filters  (1  <  <?  <  T)  used  to  identify  their  hot  items. 

3.  Publication.  Each  player,  for  each  hot  item  in  his  input,  constructs  an  opening  for  the 
commitment  to  that  element  (see  the  Commitment  step).  If  owner-privacy  is  desired,  the 
players  then  send  each  of  these  element /opened  commitment  pairs  to  p  random  players. 
All  players  the  use  a  redundant  element  distribution  protocol,  ElementDist,  to  efficiently 
and  robustly  publish  the  element /opened  commitment  pairs.  Such  a  protocol  follows  from 
the  following  rule:  when  a  player  receives  a  previously-unseen  hot  item,  he:  (1)  using  the 
opened  commitment  attached  to  the  hot  item,  checks  that  he  received  a  valid  commitment 
to  that  item  during  the  commitment  phase  of  the  protocol;  (2)  if  it  passes  the  check,  sends 
the  hot  item  (and  associated  commitment  opening)  to  each  of  his  neighbors  and  retains 
it  himself  in  the  set  P.  By  following  this  simple  protocol,  duplicate  publications  of  hot 
items  are  efficiently  suppressed,  while  ensuring  that  all  players  receive  each  hot  item  in 
their  provisional  set  P. 

4.  Verification.  As  a  result  of  publication,  each  player  obtains  a  provisional  set  of  elements 

P,  each  of  which  may  be  hot.  For  each  such  item  a  G  P,  the  player  checks  that  it  passes 
the  global  hot  item  identification  filters;  if  Xq  i^  („)  =  !)  then  it  is  truly  a  hot 

item. 


5.3.2  Putting  HotItem-Pub  to  Work 

We  now  describe  the  details  of  each  variant  of  our  HotItem-ID  protocol  (with  varying  degrees 
of  owner  privacy),  as  well  as  the  commitments  and  openings  used  to  meet  each  definition.  Note 
that  to  increase  the  level  of  owner  privacy  provided,  the  players  must  utilize  more  bandwidth. 
We  formally  describe  the  HotItem-Pub  protocol  in  Figure  5.6. 


No  Owner  Privacy.  If  the  players  are  not  concerned  about  owner  privacy,  they  may  simply 
commit  to  their  private  input  sets  using  Merkle  hash  trees.  A  Merkle  hash  tree  is  a  data  struc¬ 
ture  allowing  a  player  to  produce  a  constant-sized  commitment  to  an  ordered  set  S.  Later  any 
player  holding  S  may  produce  a  commitment  opening:  a  verifiable  proof  (given  the  committ¬ 
ment)  that  an  element  a  G  S  [41]. 


Correlated  Owner  Privacy.  In  correlated  owner  privacy,  the  players  wish  to  ensure  that 
the  elements  of  their  private  input  sets  cannot  be  traced  to  them,  but  do  not  care  if  an  adversary 
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can  determine  that  some  pair  of  elements  came  from  the  same  player.  To  achieve  this  level  of 
privacy,  the  players  may  also  use  Merkle  hash  tree  commitments  [41].  The  protocol  only  differs 
from  the  non-owner  private  version  in  that  the  players  anonymously  send  their  commitments 
and  hot  items  to  random  players  before  they  are  distributed  in  the  commitment  and  publication 
phases  of  the  protocol.  This  ensures  that  the  elements  can  not  be  associated  with  any  player. 


Uncorrelated  Owner  Privacy.  In  uncorrelated  owner  privacy,  the  players  wish  to  ensure 
that  the  elements  of  their  private  input  sets  cannot  be  traced  to  them,  and  also  care  if  an 
adversary  can  determine  that  some  pair  of  elements  came  from  the  same  player.  When  the 
players  desire  this  level  of  privacy,  they  cannot  utilize  the  efficient  Merkle  commmitments,  but 
instead  must  commit  to  each  element  separately.  As  the  commitments,  as  well  as  the  hot  items, 
are  sent  independently  and  anonymously  to  random  players  before  publication,  they  cannot  be 
linked  with  the  player  who  constructed  them.  In  addition,  as  elements  have  independent  com¬ 
mitments,  no  player  can  gain  advantage  in  determining  whether  one  player  held  both  elements. 


5.4  Analysis 

Now  we  proceed  to  analyze  the  security  and  performance  of  our  HotItem-ID  and  HotItem- 
PuB  protocols. 


5.4.1  HotItem-ID  Correctness 

In  Theorem  26  we  prove  that  our  HotItem-ID  protocol  functions  correctly:  hot  items  are 
identified  as  hot  and  cold  items  are  identified  as  cold  with  high  probability.  Note  that  the  filter 
sizes  b,  T  must  be  chosen  as  described  in  the  theorem  for  our  guarantees  to  hold,  though  they 
may  be  smaller  in  practice. 

Theorem  26.  Given  the  false  positive  rate  5+  and  the  false  negative  rate  S-,  error  bounds  e 
and  (5,  the  upper  limit  of  the  number  of  malieious  partieipants  A.  Let  6,  t,  T,  p  be  ehosen  as  the 
following:  t  :=  [p],  p  :=  O  (^Ig  a  :=  O 

/ml 

and  at  the  same  time,  satisfy  — - 1"  ^  <5+. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5+,  every  element  a  that  appears 
in  fa  <  fdk  players’  private  input  sets  is  not  identified  as  a  k-threshold  hot  item. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5-,  every  element  a  that  appears 
in  fa  >  players’  private  input  sets  is  identified  as  a  k-threshold  hot  item. 

We  defer  the  proof  of  Theorem  26  to  Appendix  B.2.1. 


b  and  T  are  ehosen  to  minimize  b  x  T, 


5.4.2  Privacy  in  HotItem-ID 

In  these  theorems,  we  prove  the  owner  and  data  privacy  of  our  HotItem-ID  protocol. 

Theorem  27.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eommuniea- 
tion  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest  player 
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Protocol:  HotItem-Pub 


Input:  There  are  n  players,  A  maliciously  colluding,  each  with  a  private  input  set  Si.  Each  player 
i  G  [n]  holds  a  private  key  ski,  allowing  him  to  construct  one-show  tags,  hi,', hr  are  independently 
chosen  cryptographic  hash  functions,  p,  b,  T,  t  are  parameters  known  to  all  participants. 

Output:  The  set  P  of  hot  items  published  by  honest  players. 

1.  Each  player  i  (1  <  i  <  n)  commits  to  their  private  input  set  Sp 

•  If  the  players  are  not  concerned  with  Owner-Privacy,  each  player  constructs  a  hash  tree  from 
his  randomly-permuted  private  input  set  Si,  with  root  hash- value  yp,  then  sends  yi  to  all 
players 

•  If  the  players  wish  to  ensure  Correlated  Owner-Privacy,  each  player  constructs  a  hash  tree 
from  his  randomly-permuted  private  input  set  Si,  with  root  hash- value  yp,  then  anonymously 
routes  yi  to  all  players 

•  If  the  players  wish  to  ensure  Uncorrelated  Owner-Privacy,  for  each  element  a  G  Si,  he 
constructs  a  commitment  to  a,  and  anonymously  routes  it  to  every  other  player 

2.  All  players  1, . . . ,  n  execute  the  HotItem-ID  protocol  (Figure  5.5),  so  that  each  player  i  {1  <  i  < 
n)  obtains:  (a)  the  set  Pi  C  Si  of  hot  items  in  that  player’s  private  input  set  and  (b)  global  hot 
item  identification  filters  {xqp}j£{i^,  ,...6}{l,...,b}  (1  <  9  <  7") 

3.  All  players  1, . . . ,  n  publish  the  hot  items  in  their  private  input  sets: 

•  If  the  players  are  not  concerned  with  Owner-Privacy: 

(a)  for  each  element  a  G  Pi,  constructs  a  proof  of  inclusion  in  Si,  showing  the  path  from 
the  leaf  a  to  the  root  value  yi 

(b)  distributes  the  elements  of  the  set  Pi  to  all  other  players,  along  with  their  proofs  of 
correctness,  using  the  ElementDist  protocol,  such  that  all  connected  honest  players 
learn  the  set  P  =  Pi  U  •  •  •  U  P„. 

•  If  the  players  wish  to  ensure  Correlated  Owner-Privacy. 

(a)  for  each  element  a  G  Pi,  constructs  a  proof  of  inclusion  in  Si,  showing  the  path  from 
the  leaf  a  to  the  root  value  yi  and  anonymously  routes  it  to  p  randomly  chosen  players. 

(b)  the  players  distribute  all  the  elements  and  proofs  received  in  Step  3a  using  the  Element¬ 
Dist  protocol,  such  that  all  connected  honest  players  learn  the  set  P  =  Pi  U  •  •  •  U  P„. 

•  If  the  players  wish  to  ensure  Uncorrelated  Owner-Privacy. 

(a)  for  each  element  a  G  Pi,  opens  his  commitment  to  a,  and  anonymously  routes  it  to  p 
randomly  chosen  players. 

(b)  the  players  distribute  all  the  elements  and  proofs  received  in  Step  3a  using  the  Element¬ 
Dist  protocol,  such  that  all  connected  honest  players  learn  the  set  P  =  Pi  U  •  •  •  U  P„. 

4.  Each  player  i  =  1, ...  ,n  verifies  that  each  element  a  G  P: 

•  was  committed  to  with  a  valid  committment  by  some  player  in  Step  1 

•  passes  the  filter  -  Vqg[T]  Xq^h„{a)  =  1 


Figure  5.6:  HotItem-Pub  protocol,  for  publishing  the  hot  items  in  each  players’  private  input. 


sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from  a  ran¬ 
dom  guess.  In  the  HotItem-ID  protoeol,  for  any  element  a,  no  eoalition  of  at  most  A  malieious 
players  ean  gain  more  than  a  negligible  advantage  in  determining  if  a  G  Si,  for  any  given  honest 
player  i  (1  <i  <n). 

We  defer  the  proof  of  Theorem  27  to  Appendix  B.2.2. 

When  considering  data  privacy,  we  wish  to  prove  that  the  cold  elements  in  SiU  -  •  •US'n  remain 
hidden  from  adversaries.  For  example,  in  the  HotItem-ID  protocol,  no  player  or  coalition  of 
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at  most  A  malicious  players  may  gain  more  than  an  expected  I  =  Ig  h  bits  of  information 
about  all  players’  inputs  IJiLi  prove  a  tighter  bound  on  the  degree  to  which  each 

element  is  hidden  in  a  crowd  of  indistinguishable  elements.  Two  elements  are  indistinguishable 
if  an  attacker  cannot  distinguish  which  one  was  in  players’  private  input  sets  based  on  the 
information  gained  in  HotItem-ID.  For  an  element  a,  its  indistinguishable  set  consists  of  all 
elements  indistinguishable  from  it  by  an  adversary  who  has  no  prior  knowledge.  To  provide 
data  privacy,  we  wish  for  cold  items  to  have  large  indistinguishable  sets;  we  compromise  perfect 
(semantic)  security,  in  which  all  items  are  indistinguishable,  to  achieve  much  greater  efficiency. 

Theorem  28.  In  the  HotItem-ID  protoeol,  eaeh  element  a,  whieh  appears  in  fa  dis- 
tinet  players’  private  input  sets,  has  an  indistinguishable  set  of  expeeted  size  ^{fa)  = 

We  defer  the  proof  of  Theorem  28  to  Appendix  B.2.3. 

We  graph  the  fraction  of  the  indistinguishable  set  size  to  the  domain  size,  ^  in  Figure  5.7. 

When  an  item  a  appears  in  fa  players’  private  datasets,  the  higher  indicates  that  it  is 

harder  for  adversaries  to  distinguish  a  from  other  items  in  the  domain  |M|.  Note  that  if  a 
appears  in  only  a  few  players’  private  input  sets,  a  very  large  proportion  of  the  domain  is 
indistinguishable  from  a.  As  fa  approaches  |,  the  size  of  the  indistinguishable  set  decreases; 
this  character  ensures  that  truly  rare  elements  are  highly  protected.  In  many  applications, 
there  is  a  big  gap  between  the  frequency  of  hot  items  and  cold  items.  In  this  case,  our  protocol 
guarantees  that  the  cold  items  will  be  extremely  well  protected,  as  their  fequency  will  be  much 
smaller  than  j. 

5.4.3  Privacy  in  HotItem-Pub 

We  now  consider  the  degree  of  owner  privacy  conferred  by  each  version  of  our  HotItem-Pub 
protocol.  Correlated  owner  privacy  allows  adversaries  to  link  the  items  published  by  each  player, 
while  uncorrelated  owner  privacy  prevents  this. 

Theorem  29.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eommuniea- 
tion  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest  player 
sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from  a  ran¬ 
dom  guess.  In  the  Correlated  Owner-Private  HotItem-Pub  protoeol,  for  any  element  a,  no 
eoalition  of  at  most  X  malieious  players  ean  gain  more  than  a  negligible  advantage  in  determin¬ 
ing  if  a  G  Si,  for  any  given  honest  player  i  (1  <i  <n),  assuming  that  the  adversary  is  given 
the  set  of  hot  items  P,  and  the  frequeney  of  eaeh  hot  item. 

Theorem  30.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eommuniea- 
tion  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest  player 
sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from  a  ran¬ 
dom  guess.  In  the  Uneorrelated  Owner-Private  HotItem-Pub  protoeol,  for  any  element  a,  no 
eoalition  of  at  most  A  malieious  players  ean  gain  more  than  a  negligible  advantage  in  determin¬ 
ing  if  a  G  Si  for  any  given  honest  player  i  (I  <  i  <  n),  assuming  that  the  adversary  is  given 
the  set  of  hot  items  P,  and  the  frequeney  of  eaeh  hot  item. 

Additionally,  given  two  elements  a,  a'  G  P,  no  eoalition  of  at  most  A  malieious  players 
ean  gain  more  than  a  negligable  advantage  in  determining  if  there  exists  a  honest  player  i 
(I  <  i  <  n)  sueh  that  a,  a'  G  Si,  assuming  that  the  adversary  is  given  the  set  of  hot  items  P, 
and  the  frequeney  of  eaeh  hot  item. 
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of  data  privacy  would  be  1  for  all  frequencies,  but  by  compromising  this  strong  definition  of  security, 
we  obtain  more  efficient  and  robust  protocols.  We  graph  the  degree  of  data  privacy  (a),  showing  the 
increase  in  protection  for  rare  elements.  The  same  function  is  graphed  in  (b)  on  a  logarithmic  scale,  for 
increased  detail. 


We  prove  these  theorems  in  Appendix  B.2.3. 


5.4.4  Performance 

By  utilizing  a  novel  combination  of  approximation  counting  algorithms,  our  protocol  represents 
a  clear  gain  in  efficiency  over  exact  counting.  In  particular,  aside  from  the  cost  of  anonymous 
routing  messages,  which  vary  according  to  the  scheme  employed,  our  protocol  achieves  a  mul- 
tiplicitive  factor  of  ^  more  efficiency  than  the  exact-counting  based  non-privacy-preserving 
protocol  using  the  same  message  propagation  abilities.  (A  network  with  improved  capability 
for  routing  messages  will  increase  the  performance  of  both  protocols.)  Note  that  this  improve¬ 
ment  in  performance  is  significant,  especially  considering  that  the  baseline  protocol  provides 
no  privacy  at  all,  where  our  protocols  enforce  principled  guarantees  of  privacy  while  retaining 
efficiency.  In  particular,  if  /c  is  a  fraction  of  n,  our  protocol  achieves  constant  per  player  commu¬ 
nication  overhead.  We  present  experimental  results  to  validate  the  efficiency  of  our  protocols 
in  distributed  networks  in  Section  5.6. 


5.5  Extensions 

In  this  section,  we  briefly  describe  several  extensions  to  our  HotItem-ID  and  HotItem-Pub 
protocols. 


Private  Input  Multisets.  Instead  of  identifying  hot  items  as  those  that  appear  in  at  least 
k  players’  private  input  sets,  we  may  identify  those  items  that  appear  at  least  k  times  in  the 
players’  private  input  multisets.  In  the  modified  protocol,  we  associate  7  one-show  values  with  a 
single  (filter, bucket)  pair  so  that  up  to  7  hits  to  a  single  bucket  by  a  single  player  can  be  counted 
as  distinct  elements.  The  constants  b,  T  may  be  chosen  by  slightly  adjusting  the  analysis  given 
in  the  proof  of  Theorem  26. 
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Using  Bloom  Filters.  Bloom  filters  provide  a  compact  probabilistic  representation  of  set 
membership  [6] .  Instead  of  using  T  filters,  we  can  use  a  combined  Bloom  filter.  This  achieves  the 
same  asymptotic  communication  complexity,  however,  in  practice  the  Bloom  filter  approach  can 
have  smaller  constants.  We  describe  our  approach  using  T  filters  in  this  paper  in  the  interests 
of  clarity.  Our  scheme  can  be  easily  adapted  to  use  Bloom  filters.  The  choices  of  the  constants 
6,  T  can  be  adjusted  to  give  an  acceptable  false-positive  rate,  given  the  domain  M ;  the  analysis 
may  be  performed  similarly  to  that  of  [6]  and  the  analysis  of  the  HotItem-ID  protocol  given 
in  this  paper. 

Theorem  31.  Given  the  false  positive  rate  S+  and  the  false  negative  rate  6-,  error  bounds  e 
and  (3,  the  upper  limit  of  the  number  of  malieious  partieipants  A.  Let  b,  t,  T,  p  be  ehosen  as  the 
following:  t  :=  [ff],  p  :=  O  (^Ig  a  :=  O 

/mT[ 

and  at  the  same  time,  satisfy  - - 1"  ^  <  (5+. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5+,  every  element  a  that  appears 
in  fa  <  Pk  players’  private  input  sets  is  not  identified  as  a  k-threshold  hot  item. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5-,  every  element  a  that  appears 
in  fa  >  players’  private  input  sets  is  identified  as  a  k-threshold  hot  item. 

We  defer  the  proof  of  Theorem  31  is  given  in  Appendix  B.2.4.  As  the  properties  of  owner 
and  data  privacy  are  unchanged  by  this  variant,  we  do  not  perform  a  separate  analysis. 
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b  and  T  are  ehosen  to  minimize  b  x  T, 


Top  -k.  The  Top-A:  problem  is  identifying  those  k  elements  that  appear  most  often  in  the 
players’  private  input  sets.  An  efficient  non-private  solution  to  this  problem  is  given  in  [10].  In 
their  protocol,  a  trusted  authority  calculates  an  estimated  threshold  value  r;  every  element  in 
the  top  k  appears  at  least  r  times. 

We  now  briefly  describe  how  to  calculate  the  top  k  values  without  use  of  a  central  authority 
and  while  preserving  each  players  privately.  Like  in  HotItem-ID,  the  players  calculate  global 
filters,  but  by  using  t-minimum  value  aggregation  instead  of  t-collection.  In  this  way,  each  player 
may  estimate  the  number  of  players  who  hit  each  (filter, bucket)  pair.  Using  these  estimates, 
the  players  may  approximate  r,  marking  each  (filter, bucket)  pair  hit  by  at  least  r  players  as  1. 
The  global  filters  may  then  be  used  to  identify  the  top-/c. 


5.6  Experimental  Results 


In  order  to  experimentally  evaluate  the  efficiency  and  accuracy  of  our  protocol,  we  implemented 
our  HotItem-ID  protocol  with  t-collection  and  applied  it  to  distributed  worm  signature  de¬ 
tection.  As  we  describe  in  Section  5.6.3,  our  experimental  results  support  the  efficiency  and 
accuracy  of  our  protocol. 

5.6.1  Distributed  Worm  Signature  Detection 

A  widely  utilized  tactic  for  detecting  worms  is  to  search  for  byte  strings  that  appear  with 
unusually  high  frequency  in  network  traffic  [35,  52].  By  distributing  the  execution  of  this 
strategy  over  a  large  number  of  hosts,  players  can  increase  the  accuracy  of  their  results  [35]. 
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Figure  5.8:  Number  of  hosts  (players)  that  have  generated  each  content  block  (item)  from  observed 
suspicious  flows. 


However,  as  network  traffic  often  contains  sensitive  information  such  as  email  or  information 
that  could  aid  a  network  attack,  it  is  imperative  to  protect  the  privacy  of  participants. 

In  a  distributed  network  monitoring  scenario,  each  monitor  attempts  to  detect  malicious 
traffic  on  his  own  network.  As  anomaly  detection  is  imperfect,  this  process  will  often  identify 
innocuous  traffic  as  malicious.  Thus,  monitors  must  take  further  steps  to  improve  the  accuracy 
of  their  results;  comparing  the  results  with  other  monitors.  If  the  malicious  traffic  belongs  to 
a  worm,  there  will  be  other  monitors  that  observed  similar  traffic;  an  effective  heuristic  for 
identifying  possibly  malicious  traffic  is  that  worms  often  send  a  large  volume  of  repetitive  data 
as  they  attempt  to  infect  other  computers.  We  use  our  HotItem-ID  protocol  to  perform  this 
comparison  while  protecting  the  privacy  of  non-worm  network  traffic. 

Because  polymorphic  worms  often  change  their  form,  players  must  compare  many  small 
pieces  of  a  possible  worm  payload  instead  of  the  entire  payload.  We  use  a  content-based 
payload  partitioning  technique  (proposed  in  [35])  to  split  a  payload  into  many  small  segments. 
In  this  section,  we  call  such  segments  content  blocks.  The  payload  partitioning  technique  is 
robust  against  small  payload  byte  changes  and  generates  the  same  content  blocks  for  different 
forms  of  a  worm.  Once  monitors  have  identified  possible  worms  and  split  them  into  content 
blocks,  they  then  perform  hot  item  identification  over  the  sets  of  content  blocks.  Content  blocks 
generated  from  innocuous,  private  network  traffic  appear  only  in  the  sets  from  a  few  monitors. 
Hot  content  blocks  indicate  that  the  traffic  has  been  seen  at  an  unusually  large  number  of 
hosts;  it  is  therefore  almost  certainly  part  of  a  worm  attacking  those  hosts  and  can  be  used  as 
a  signature  of  the  worm. 

5.6.2  Real-world  Data  and  Experiment  Method 

We  performed  simulated  distributed  network  monitoring  on  traces  captured  on  a  campus  net¬ 
work  that  uses  one  third  of  a  class  B  IP  address  space.  HTTP-trl,  HTTP-tr2,  and  HTTP-tr3 
are  one-hour  long  traces  containing  all  HTTP  (tcp  port  80)  packets  and  payloads  addressed  to 
hosts  in  the  monitored  network.  Similarly,  SMTP-trl,  SMTP-tr2,  and  SMTP-tr3  contain  SMTP 
(tcp  port  25)  packets  captured  for  an  one-hour  time  period.  During  the  trace  collection  periods, 
a  total  of  5246  IP  addresses  received  at  least  one  packet.  Our  trace  does  not  contain  any  known 
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Figure  5.9:  Normalized  bandwidth  consumption  per  player  in  performing  hot  item  identification  (k  = 
100).  Underlines  values  indicate  that  there  were  false  positives  or  false  negatives. 


worm  traffic  and  does  not  contain  hot  items  (threshold  k  =  100).  We  graph  the  number  of 
hosts  who  generate  each  innocuous  content  block  (by  identification  of  possible  worm  traffic)  in 
Figure  5.8.  Only  at  most  2.2%  (HTTP)  or  1.4%  (SMTP)  of  content  blocks  appear  at  more  than 
one  host.  Through  manual  examination,  we  determined  that  content  blocks  that  appeared  at 
more  than  one  host  indicated  web  crawler  (HTTP)  or  spamming  (SMTP)  activity,  and  never 
appeared  at  more  than  45  hosts. 

We  injected  simulated  worm  traffic  into  the  input  of  200  of  1024  monitors  in  the  network. 
Each  monitor  generates  a  set  of  content  blocks  from  captured  anomalous  traffic.  Note  that 
all  the  sets  generated  from  those  200  attacked  monitors  include  the  content  blocks  from  the 
worm  traffic.  We  ran  our  HotItem-ID  protocol  in  the  overlay  network  of  1024  monitors  and 
measured  the  number  of  messages  and  the  required  bandwidth  per  monitor,  while  varying  the 
f-collection  parameters  t  and  7,  and  the  average  number  of  neighbors  ■0  in  the  overlay  network. 
We  also  computed  the  false  positive  and  false  negative  rates  at  the  end  of  the  HotItem-ID 
execution  by  counting  the  innocuous  content  blocks  identifed  as  to  be  hot  (false  positives), 
and  the  worm  content  blocks  that  are  not  identified  (false  negatives).  In  our  experiment,  we 
utilized  the  parameters  b  :=  606,  T  :=  5  for  HTTP  traces,  and  b  :=  4545,  T  :=  5  for  SMTP 
traces,  chosen  according  to  the  guidelines  in  Section  5.4.1.  We  compared  our  protocol  to  a 
non-private  naive  protocol,  in  which  all  content  blocks  are  forwarded  to  all  participating  1024 
monitors,  using  the  same  network  topology  and  the  same  communication  model  for  both  the 
naive  protocol  and  the  HotItem-ID  protocol. 


5.6.3  Bandwidth  Consumption  and  Accuracy 

Our  HotItem-ID  protocol  efficiently  identified  every  simulated  worm  injected  into  the  network 
traces,  while  generating  no  false  positives;  no  innocuous  data  was  mistakenly  categorized  as 
malicious  except  when  we  use  SMTP-trl  with  t  <  3  and  7  >  0.5.  We  present  our  comparison 
of  the  required  bandwidth  and  messages  in  Figure  5.9.  HTTP-tr3  contains  724  unique  content 
blocks,  while  SMTP-trl  contains  46,120  unique  ones.  Underlined  values  in  the  figure  indicate 
there  were  false  negatives  (failed  to  identify  worm  content  blocks).  Our  HotItem-ID  protocol 
scales  better  than  the  naive  protocol  (Section  5.2);  as  problems  increase  in  size,  our  protocol 
becomes  more  attractive.  Even  at  small  problem  sizes,  the  overhead  required  to  protect  the 
privacy  of  participants  is  not  high.  Our  experiments  show  that  to  ensure  correctness  while 
retaining  efficiency,  we  may  set  7  :=  0.2, t  :=  3  (HTTP)  and  7  :=  0.5, t  :=  5  (SMTP).  Our 
HotItem-ID  implementation,  based  on  an  efficient  group  signature  scheme  [7],  requires  193 
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bytes  per  message  (Appendix  B.3)  while  the  naive  protocol  utilizes  only  20  bytes  per  message.  ^ 
However,  our  protocol  requires  only  a  small  number  of  message  transmissions;  as  a  result, 
HotItem-ID  used  only  39%  and  58%  of  the  bandwidth  used  by  the  naive  protocol  in  the 
HTTP-tr3  and  SMTP-trl  experiments,  respectively.  Note  that  the  performance  gain  from  our 
HotItem-ID  protocol  increases  as  the  more  monitors  participate.  For  example,  when  10240 
monitors  participate  in  worm  signature  generation  and  we  need  only  10%  of  them  to  catch 
worm  traffic  {k  =  1000),  our  HotItem-ID  protocol  uses  less  than  6%  of  the  bandwidth  used 
by  naive  protocol. 


■^To  save  bandwidth  and  give  a  trivial  measure  of  privacy  against  casual  attacks  [37],  we  hash  each  content 
block  with  SHA-1  in  the  naiVe  protocol. 
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Chapter  6 


Conclusion 


In  this  thesis,  we  have  introduced  multiple  techniques  and  protocols  for  privacy-preserving  dis¬ 
tributed  information  sharing.  We  designed  composable  set  and  multiset  operations,  and  pro¬ 
tocols  for  Set-Intersection,  Over-Threshold  Set-Union,  Cardinality  Set-Intersection,  Threshold 
Set-Union,  Subset,  and  CNF  formula  evaluation  based  on  these  operations.  We  then  examined 
the  problems  of  hot  item  identification  and  publication,  variants  of  the  Over-Threshold  Set- 
Union  problem.  In  order  to  increase  the  efficiency  and  robustness  of  our  hot  item  protocols,  we 
designed  these  protocols  to  achieve  novel  definitions  of  security  and  privacy. 
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Appendix  A 


Appendices  for  Privacy-Preserving 
Set  Operations 

A.l  Notation 

•  P  ~  the  set  of  elements  which  can  be  members  of  a  private  input  set 

•  k  -  size  of  each  private  input  set 

•  n  -  number  of  players  participating  in  a  protocol 

•  t  -  threshold  number,  an  element  must  appear  t  times  in  the  private  input  sets  to  be 
included  in  the  threshold  set 

•  Epk{-)  -  encryption  under  the  additively  homomorphic,  public  key  cryptosystem  to  which 
all  players  share  a  secret  key 

•  Epk{a)  +h  Epk{b)  -  combination  of  two  ciphertexts  (under  the  homomorphic  cryptosys¬ 
tem)  to  produce  a  re-randomized  ciphertext  which  is  the  encryption  of  a  -|-  6 

•  a  Xh  Epk{b)  -  combination  of  an  integer  and  a  ciphertext  (under  the  homomorphic  cryp¬ 
tosystem)  to  produce  a  re-randomized  ciphertext  which  is  the  encryption  of  ab 

•  f  *h  Epk{g)  -  combination  of  two  polynomials  (under  the  homomorphic  cryptosystem)  to 
produce  a  re-randomized  encrypted  polynomial  which  is  the  encryption  ol  f  *  g 

•  Fq,  . . . ,  -Frf  -  public  ‘helper’  polynomials  for  computing  element  reduction 

•  h{-)  -  a  cryptographic  hash  function  from  {0,1}*  to  {0,1}^  =  Ig  (^)),  where  e  is 

negligible. 

•  Rd£;(5)  denotes  the  element  reduction  by  d  of  set  S 

•  denotes  the  set  of  all  polynomials  of  degree  between  0  and  a  with  coefficients  from 

R 

•  [c]  for  an  integer  c  denotes  the  set  { 1 , . . . ,  c} 

•  a  :=  b  denotes  that  the  variable  a  is  given  the  value  b 

•  a  \\  b  denotes  a  concatenated  with  b 

•  a  ^  S  denotes  that  element  a  is  sampled  uniformly  from  set  S 

•  f  *  g  is  the  product  of  the  polynomials  /,  g 

•  deg(p)  is  degree  of  polynomial  p 

•  is  the  dth  formal  derivative  of  p 

•  gcd(p,  q)  is  the  greatest  common  divisor  of  p,  q 

•  Si  is  the  ith  player’s  private  input  set 

•  Vj  is  the  jth  element  of  the  set  V,  under  some  arbitrary  ordering 
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Appendix  B 


Appendices  for  Hot-Item 
Identification 

B.l  Notation 

Notation. 

•  a  II  a'  -  a  and  a'  concatenated 

•  a  :=  a'  -  the  value  of  a'  is  assigned  to  a 

Variable  List. 

•  a  -  an  item,  member  of  a  private  input  set 

•  b  -  number  of  buckets  per  filter 

•  d  -  d{x)  =  a  measure  of  data  privacy  which  varies  by  the  number  of  players  holding 
any  particular  item 

•  Ci  -  group  certificate  for  player  i  (1  <  i  <  n)  in  group  signature  for  one-show  tags  scheme 

•  fa  -  number  of  players’  private  input  sets  in  which  an  element  a  appears 

•  91^92  -  group  elements  in  one-show  tag  scheme 

•  gqj  -  group  element  for  producing  one-show  value  {I  <q<T,l<j<b) 

•  7i{-)  -  a,  cryptographic  hash  function  with  range  {0, 1}'' 

•  /ii /ir(-)  -  independent  cryptographic  hash  functions  with  range  [6] 

•  i  -  index  over  players  (1  <  i  <  n) 

•  I  -  expected  number  of  bits  of  information  that  any  coalition  can  learn  about  players’ 
private  input  sets 

•  j  -  index  over  buckets  (1  <  J  <  6) 

•  k  -  minimum  number  of  players  that  must  hold  an  alert  for  it  to  be  published 

•  £  -  index 

•  m  -  maximum  size  of  Si 

•  M  -  domain  of  elements  (UILi 

•  n  -  the  number  of  players 

•  o  -  a  secret  value  used  in  constructing  one-show  tags 

•  Pe  -  probability  of  edge  existence  in  a  random  network  graph 

•  Pt  -  probability  that  an  anonymously  routed  message  is  traced  to  its  source 

•  Pi  -  each  player’s  set  of  alerts  for  publication 
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•  P  =  lSl=lP^ 

•  q  -  index  over  filters  (1  <  q  <T) 

•  r[ ,  r2  -  prime  numbers  used  in  one-show  tags 

•  s  -  group  manager’s  secret  element  in  group  signature  scheme 

•  Si  -  each  player’s  secret  element  in  group  signature  scheme  (1  <  f  <  n) 

•  Si  -  each  player’s  private  input  set  (1  <  i  <  n) 

.  5  =  U”=i5* 

•  t  -  number  of  values  collected  to  estimate  the  number  of  distinct  elements  in  a  set 

•  T  -  number  of  filters 

•  u  -  estimated  set  size 

•  Ui  -  the  tags  collected  during  a  t-collection  or  t-minimum  value  aggregation  protocol  by 
player  i  (1  <  i  <  n) 

•  V  -  a  one-show  value 

•  bj^i,q,j]j&[b]  -  local  filter  q  {1  <  q  <  T)  ioi  player  i  (1  <  i  <  n) 

•  [^q,j]j&[b]  -  global  filter  (1  <  <  T) 

•  Hi  -  the  root  of  a  Merkle  hash  tree,  a  commitment  to  the  input  set  Si  of  player  i  (1  <  i  <  n) 

•  zq,  zi,  Z2  -  elements  of  the  group  signature  public  key  for  one-show  tags 

•  Zn  -  the  ring  of  integers  modulo  n 

•  a  -  number  of  separate  approximate  distinct  element  counting  estimations  that  must  be 
combined  to  obtain  the  desired  confidence  probability 

•  fi  -  secret  value  for  ZK  proof  in  one-show  tags  scheme 

•  7  -  ’’gap”  factor  in  approximate  distinct  element  counting 

•  5i  -  confidence  probability  for  the  approximate  distinct  element  counting  algorithm 

•  5+  -  maximum  probability  of  false  positive 

•  5-  -  maximum  probability  of  false  negative 

•  e  -  allowed  error  bound  for  the  approximate  distinct  element  counting  algorithm  is  between 
(1  —  e)  times  the  actual  value  and  (1  -|-  e)  times  the  actual  value 

•  K  -  security  parameter  for  modified  unpredict  able- value  group  signature  scheme 

•  A  -  number  of  malicious  players 

•  //  -  secret  value  for  ZK  proof  in  one-show  tags  scheme 

•  7]  -  part  of  public  key  for  one-show  scheme  ry  =  rir2 

•  (p  -  secret  value  for  ZK  proof  in  one-show  tags  scheme 

•  ip  -  maximum  number  of  duplicates  of  each  element  allowed  in  a  private  input  multiset 
(see  Section  5.5) 

•  p  -  number  of  randomly  chosen  players  chosen  to  receive  anonymous  messages  in 
HotItem-ID  and  HotItem-Pub,  so  that,  with  high  probability,  at  least  one  message 
reaches  an  honest  player 

•  T  -  threshold  value  for  top-k  protocol  extension  to  HotItem-ID  (see  Section  5.5) 

•  a  -  seed  for  pseudo-random  number  generator 

•  cj£  -  the  ^\h  member  of  Z^,  output  by  a  pseudo-random  number  generator  on  input  a 

•  ^ifa)  -  expected  size  of  indistinguishable  set,  if  element  a  appears  in  fa  players’  private 
inputs 

•  tp  -  average  number  of  neighbors  per  node 
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B.2 


Detailed  Analysis 


B.2.1  Correctness 

In  this  section,  we  analyze  our  HotItem-ID  protocol  to  ensure  that,  given  certain  choices  of  the 
parameters  b,  T,  our  protocol  correctly  identifies  hot  items  with  high  probability.  In  analyzing 
this  protocol,  we  must  consider  both  false  positives  (in  which  cold  items  are  identified  as  hot), 
and  false  negatives.  Errors  may  be  caused  by  inaccurate  approximate  distinct  element  counting, 
a  badly  constructed  filter,  or  a  combination  of  the  two. 

Recall  that  hi, ...  ,hT  '.  {0, 1}''  ^  {1, . . . ,  6}  are  polynomial- wise  independent  hash  function 
with  uniformly  distributed  output,  such  as  cryptographic  hash  functions,  and  that  7i  is  a 
collision-resistant  cryptographic  hash  function  [40]. 


Error  from  Approximate  Distinct-Element  Counting.  To  begin,  we  state  a  simple 
lemma  that  shows  the  t-collection  and  t-minimum  value  aggregation  protocols  (Figures  5.3 
and  5.4)  obtain  the  information  needed  for  approximating  the  number  of  ‘hits’  to  any  particular 
bucket. 

Lemma  32.  Each  participant  in  the  t-collection  protocol  of  Figure  5.3  collects  t  ‘small-value’ 
one-show  tags  created  by  distinct  players,  if  such  tags  exist. 

Each  participant  in  the  t-minimum  value  aggregation  protocol  of  Figure  5. 4  obtains  the  t 
one-show  tags  created  by  distinct  players  with  the  smallest  hash  values. 


Proof.  The  proof  of  this  theorem  relies  on  the  proof  of  information  distribution  in  [46],  which 
uses  a  closely  related  information  distribution  mechanism.  We  can  thus  observe  that  all  in¬ 
formation  held  by  connected  honest  players  is  distributed  to  all  connected  players  unless  it  is 
specifically  filtered  by  an  honest  player.  In  the  t-minimum  value  aggregation  protocol,  the  only 
tags  filtered  are  those  which  do  not  appear  in  the  final  result;  that  is,  they  are  not  one  of  the 
t  one-show  tags  with  smallest  hash  value.  Note  also  that  the  one-show  value  of  a  tag  is  unique 
per  player,  and  that  each  player  cannot  construct  tags  with  different,  valid  one-show  values 
with  overwhelming  probability.  Thus,  in  the  t-minimum  value  aggregation  protocol,  all  players 
obtain  the  t  one-show  tags  with  smallest  values  created  by  distinct  players. 

We  may  reason  similarly  about  our  t-collection  protocol.  Each  player  only  filters  tags  (by 
terminating  his  involvement  in  the  protocol)  if  he  has  collected  t  tags  and  sent  them  on  to 
all  neighbors.  Thus,  the  player  has  ensured  that  if  t  small  hash-value  tags  exist,  each  of  his 
neighbors  collect  them  as  well.  By  induction,  all  connected  honest  players  thus  collect  t  small- 
value  tags  created  by  distinct  players,  if  such  tags  exist.  □ 


As  detailed  in  [4],  if  7i{v)  is  the  tth  smallest  hash  value  in  a  set  S,  where  7i  :  {0, 1}*  ^  {0, 1}'^, 
then  the  estimate  of  j5j  is  By  computing  the  median  of  O  jj")  such  estimates,  with 

computationally  independent  hash  functions,  we  obtain  j5j,  an  (e,  (li)-approximation  of  j5j: 

Lemma  33.  This  algorithm,  for  any  e,5i  >  0,  (e,  Si)-approximates  jS"].  That  is. 


Pr 


151>(l  +  e)151Vl51<(l-e)151 


<5i 


Proof.  Proof  of  this  theorem  is  given  in  [4] . 


□ 
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As  this  is  an  (e,  (5i)-approximation  algorithm,  we  may  note  that  we  are  concerned  with 
elements  that  appear  fewer  than  times,  when  computing  false  positives,  and  elements  that 
appear  at  least  times,  when  computing  false  negatives.  Based  on  this  algorithm,  we  may 
conclude  that,  if  there  exist  at  least  t  elements  with  hash  values  at  most  our  estimate  of 
|5|  is  at  least  k: 

Corollary  34.  If  there  exist  t  values  G  S  sueh  t/iat  then  the 

approximate  distinet  element  eounting  algorithm  of  [4]  will  estimate  that  |5|  >  k. 

Proof.  Let  w  =  max^gj^jlTf (u^)}.  The  approximation  algorithm  specifies  that  |5|  =  As 
w  <  ^,  then  IS"!  >  /c.  □ 


Error  from  Filters.  We  now  calculate  the  probability  that  an  element  a,  which  appears  in 
fa  <  k'  players’  private  input  sets,  is  identified  as  a  fc'-threshold  hot  item.  We  set  k'  :=  so 
as  to  account  for  the  allowed  inaccuracy  in  the  approximate  counting  algorithm. 

As  illustrated  in  Figure  B.l,  we  may  bound  the  probability  that  a  is  erroneously  identified 
as  k'-hot  by  one  filter  j'  (1  <  f  <  T)  by  determining  the  maximum  number  of  filter  buckets 
that  were  hit  by  k'  —  fa  distinct  players^.  If  a  bucket  j  (1  <  j  <  6)  was  hit  by  k'  —  fa  players 
who  do  not  hold  a,  then  if  hjfa)  =  j,  then  a  will  be  identified  by  filter  j'  as  a  /^'-threshold 
hot  item.  As  malicious  players  may  claim  to  hit  all  buckets,  a  minimum  of  k'  —  fa  —  X  honest 
players  must  hit  any  given  bucket  for  it  to  cause  any  possibility  of  error. 

Each  honest  player  has  a  maximum  of  m  items  in  their  private  input  set.  Thus,  using  every 
element  of  each  players’  private  input  set,  each  group  of  /c'  —  /«  —  A  honest  players  may  hit 
at  most  m  buckets  a  sufficient  number  of  times  to  introduce  any  danger  of  error^.  There  are 
n  —  fa  —  X  honest  players  who  do  not  hold  a  as  an  element  of  their  private  input  set,  and  thus 
Lfc/"/" ~  aJ  groups  of  players  that  can  hit  m  buckets  per  group  enough  times  to  allow  danger 
of  an  error.  Note  that  any  group  of  fewer  than  k'  —  fa  —  X  players  cannot  hit  any  particular 
bucket  a  sufficient  number  of  times  to  cause  a  total  of  k'  —  fa  hits;  each  player  may  only  hit 
any  particular  bucket  at  most  once. 

Thus,  at  most  buckets  of  each  filter  can  be  ‘unsafe’;  if  a  is  not  mapped  to  one 

of  those  buckets,  then  there  is  no  possibility  of  error  from  the  malfunctioning  of  the  filter.  As 
hji{a)  is  distributed  computationally  indistinguishably  from  uniformly  over  [6],  we  may  thus 
bound  the  probability  that  bucket  hjfa)  is  erroneously  designated  as  ‘hot’: 


Pr  a  is  identified  as  k'-h.ot  by  one  filter] 


< 


m[ 


n-fa-X  I 
fc'-/a-AJ 

b 


Combined  Error.  We  may  now  consider  the  two  sources  of  error  together.  There  are  two 
possible  error  types:  false  positives  (in  which  cold  items  are  identified  as  hot),  and  false  negatives 
(in  which  hot  items  are  not  identified  as  hot). 

Theorem  26.  Given  the  false  positive  rate  5+  and  the  false  negative  rate  S-,  error  bounds  e 
and  (5,  the  upper  limit  of  the  number  of  malieious  partieipants  X.  Let  b,  t,  T,  p  be  ehosen  as  the 

^Note  that  the  fa  signatures  related  to  the  element  a  will  raise  the  total  number  of  signatures  to  k'\  if  there 
are  at  least  k'  signatures,  error  in  the  approximate  counting  algorithm  may  cause  a  to  be  identified  as  a  k-hot 
item. 

■^Note  that,  with  high  probability,  more  that  one  element  of  an  honest  players’  private  input  set  will  hash  to 
the  same  bucket,  and  thus  this  is  a  conservative  analysis. 
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Figure  B.l:  Buckets  that  are  unsafe  for  hj'{a)  (1  <  f  <  T)  are  those  that  have  been  marked  as  hit 
by  a  sufficient  number  of  players  to  allow  the  possibility  that  a  might  be  erroneously  identified  as  a 
fc'-threshold  hot  item.  In  (1),  a  is  mapped  to  a  sufficiently  full  bucket,  causing  a  to  be  erroneously 
identified  as  a  k'-hot  item.  In  (2),  a  is  mapped  to  a  safe  bucket. 


following:  t  :=  [p],  p  :=  O  ^Ig  j  a  :=  O  ^Ig  b  and  T  are  chosen  to  minimize  b  x  T, 

/ml  I 

and  at  the  same  time,  satisfy 


In  the  HotItem-ID  protocol,  with  probability  at  least  1  —  5+,  every  element  a  that  appears 
in  fa  <  fdk  players’  private  input  sets  is  not  identified  as  a  k-threshold  hot  item. 

In  the  HotItem-ID  protocol,  with  probability  at  least  1  —  5-,  every  element  a  that  appears 
in  fa  >  players’  private  input  sets  is  identified  as  a  k-threshold  hot  item. 

Proof.  The  probability  that  an  element  a,  which  appears  in  fa  <  players’  private  input 
sets,  is  not  identified  as  a  fc-threshold  hot  item  can  be  bounded  as  follows  (recall  that  we  have 
set  k'  =  x^,  to  account  for  the  allowed  tolerance  in  approximate  counting): 

Pr  [a  is  identified  as  /c-hot]  <  Pr[in  all  filters,  element  a  is  identified  as  fc^-hot  V 

set  of  size  <  k'  approximated  as  >  A:] 

T 

=  Pr[in  filter  j' ,  element  a  is  identified  as  fc^-hot  V 

j'=i 

k 

approximated  as  >  k\ 


< 


If  an  element  a,  which  appears  in  fa  >  players’  private  input  sets,  is  not  identified  as  a 


77 


fc-threshold  hot  item,  it  is  due  to  error  in  the  set-counting  approximation.  When  the  number 
of  hits  for  every  bucket  in  a  filter  is  counted  exactly,  there  can  be  no  false  negatives.  Thus,  we 
may  bound  the  probability  of  a  false  negative  as  follows: 


Pr  [a  is  not  identified  as  /c-hot]  =  Pr 

T 


in  at  least  one  filter,  a  set  of  size  >  ^ -  approximated 


< 


< 


^Pr 

i'=i 

T 

E«. 

i'=i 

5iT 


in  filter  j  ,  a  set  of  size  >  -  approximated  as  < 

1  —  e 


□ 


Choice  of  Constants.  Given  our  analysis,  we  may  outline  how  to  choose  the  constants 
(5i,  t,  b,  T  based  on  the  parameters  e,  6-,  A,  n,  m,  5^,  (3. 


(5i.  Recall  that  the  probability  of  a  false  negative,  for  an  element  a  which  appears  in  at  least 
fa  >  players’  private  input  sets,  is  required  to  be  at  most  6-.  We  may  then  choose 
5i  as  follows: 

Pr  [a  is  not  identified  as  /c-hot] 

b,  T.  Given  this  assignment  of  (5i ,  we  may  simplify  our  bound  on  the  probability  of  a  false 
positive  on  an  element  a,  which  appears  in  /3/c  <  yyy  players’  private  input  sets,  as 
follows: 


Pr  [a  is  identified  as  /c-hot]  < 


< 


'm[ 


K  f  \  J 

+ci 


'm[ 


n—fik—X  I 

b  T 


=  (5_ 

<  biT 

Y 


<5- 


We  choose  b,T  so  as  to  minimize  b  xT,  while  satisfying  the  above  constraint. 
t,a.  In  the  approximate  distinct  element  counting  algorithm  in  [4],  t  :=  [^],  a  :=  O  ^Ig 

In  practice,  one  may  safely  choose  to  retain  t  <  [p]  smallest  values  per  parallel  execution, 
and  run  only  one  parallel  execution,  while  retaining  a  confidence  bound  of  di.  We  found 
that  t  :=  25  was  sufficient. 

Note  that,  when  running  very  small  examples,  or  with  very  high  accuracy  requirements, 
one  may  obtain  an  assignment  for  t  that  is  >  k.  In  this  case,  simply  set  t  :=  k,  and 
note  that  e,  di  =  0;  each  player  is  now  collecting  a  sufficient  number  of  signatures  so  as 
to  determine,  without  error,  whether  any  particular  bucket  was  hit  by  at  least  k  distinct 
players. 


as  <  /c 

4 
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p.  The  choice  of  p  must  be  based  on  the  choice  of  anonymous  message  system.  We  provide 
here  an  analysis  based  upon  the  scheme  of  [42],  in  which  each  node,  upon  receipt  of 
a  message  to  be  anonymously  routed,  sends  it  to  a  random  neighbor  with  probability 
Pf  >  .5,  and  to  its  intended  destination  with  probability  1—pf.  Note  that  the  probability 
that  a  message  will,  somewhere  along  its  anonymously  routed  path,  encounter  a  malicious 
node  is  at  most  -  +  -»/•  +  -p ^  +  •••<-.  As  we  cannot  ensure  that  a  message  is  delivered 
if  it  encounters  a  malicious  node,  we  wish  to  choose  p  such  that  at  least  one  message  will 
be  delivered  to  its  destination,  with  probability  at  least  5^-  Thus,  p  :=  O  ("ig 


B.2.2  Owner  Privacy 

Theorem  27.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eommu- 
nieation  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest 
player  sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from 
a  random  guess.  In  the  HotItem-ID  protoeol,  for  any  element  a,  no  eoalition  of  at  most  A 
malieious  players  ean  gain  more  than  a  negligible  advantage  in  determining  if  a  G  Si,  for  any 
given  honest  player  i  (1  <  i  <  n) . 


Proof.  The  proof  of  this  theorem  is  straightforward. 


□ 


Theorem  29.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eom- 
munieation  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest 
player  sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from 
a  random  guess.  In  the  Correlated  Owner-Private  HotItem-Pub  protoeol,  for  any  element 
a,  no  eoalition  of  at  most  A  malieious  players  ean  gain  more  than  a  negligible  advantage  in 
determining  if  a  G  Si,  for  any  given  honest  player  i  (I  <i  <n),  assuming  that  the  adversary 
is  given  the  set  of  hot  items  P,  and  the  frequeney  of  eaeh  hot  item. 


Proof.  The  proof  of  this  theorem  is  straightforward. 


□ 


Theorem  30.  Assume  that  one-show  tags  are  unlinkable  and  that  the  anonymous  eom- 
munieation  system  is  seeure  sueh  that  no  eoalition  of  adversaries  ean  distinguish  whieh  honest 
player  sent  any  given  anonymous  message  with  probability  more  than  negligibly  different  from 
a  random  guess.  In  the  Uneorrelated  Owner-Private  HotItem-Pub  protoeol,  for  any  element 
a,  no  eoalition  of  at  most  A  malieious  players  ean  gain  more  than  a  negligible  advantage  in 
determining  if  a  G  Si  for  any  given  honest  player  i  (1  <  i  <  n) ,  assuming  that  the  adversary  is 
given  the  set  of  hot  items  P,  and  the  frequeney  of  eaeh  hot  item. 

Additionally,  given  two  elements  a,  a'  G  P,  no  eoalition  of  at  most  A  malieious  players 
ean  gain  more  than  a  negligable  advantage  in  determining  if  there  exists  a  honest  player  i 
(1  <  i  <  n)  sueh  that  a,  a'  G  Si,  assuming  that  the  adversary  is  given  the  set  of  hot  items  P, 
and  the  frequeney  of  eaeh  hot  item. 


Proof.  The  proof  of  this  theorem  is  straightforward. 


□ 
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hi(a)-2 


hi(a)  =  2 


(1) 


h2(a)  =  4 


hgCa)  =  1 


(2) 


hgCa)  =  1 


Figure  B.2:  Players  collect  one-show  tags  for  each  filter  bucket  during  the  HotItem-ID  protocol. 
Given  a  complete  set  of  filters  for  some  element,  one  may  still  not  determine  which  element  produced 
the  filters.  However,  each  tag  has  a  probability  of  1  —  |  ^  of  having  too  high  a  value  for  the  t-collection 

phase,  and  thus  of  being  hidden.  When  all  tags  for  an  element  in  a  specific  filter  are  removed,  as  in  (2), 
an  even  larger  number  of  elements  could  have  produced  the  observed  filters. 


B.2. 3  Data  Privacy 


Theorem  28.  In  the  HotItem-ID  protocol,  each  element  a,  which  appears  in  fa  dis¬ 
tinct  players’  private  input  sets,  has  an  indistinguishable  set  of  expected  size  ^{fa)  = 

-T  /T\  r,  _  (i  _ 


EL  (I)  (1-1)^ 


\M\ 

~w- 


Proof.  Let  M  be  the  domain  of  possible  private  input  set  elements,  such  that  Vjg[„]  Si  C  M. 
Given  knowledge  of  /ii(a), . . .  ,hT{a),  we  may  infer  that  approximately  ^  possible  elements 
a  G  M  could  have  produced  that  filter  pattern,  as  there  are  b'^  total  possible  filter  patterns 
caused  by  one  element  and  hi, ...  ,hT  are  cryptographically  secure  hash  functions.  As  illustrated 
in  Figure  B.2,  if  information  about  one  or  more  filters  has  been  elided,  a  correspondingly  larger 
number  of  elements  could  have  produced  that  filter  pattern.  We  may  use  Bernoulli  trials  to 
calculate  the  expected  size  of  the  indistinguishable  set  for  element  a,  which  appears  in  fa  players’ 
private  input  sets: 


E  [size  of  indis.  set] 


Pr  \i  filters  are  elided] 

l=\ 


M 

b^ 


We  graph  in  Figure  5.7  the  expected  proportion  of  the  total  domain  M  of  the  indistinguish¬ 
able  set,  as  fa  increases.  Note  that  if  a  appears  in  only  a  few  players’  private  input  sets,  a  very 
large  proportion  of  the  domain  is  indistinguishable  from  a.  As  fa  approaches  |,  less  and  less 
of  the  domain  is  indistinguishable;  this  character  ensures  that  truly  rare  elements  are  highly 
protected.  As  t  is  a  constant,  independent  of  n,  while  k  will  often  grow  with  the  size  of  the 
network,  we  see  that  protection  for  rare  elements  generally  increases  as  the  network  increases 
in  size. 

□ 
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B.2.4  Analysis  for  Bloom  Filters 


Theorem  31:  Given  the  false  positive  rate  5+  and  the  false  negative  rate  5-,  error  bounds  e 
and  (3,  the  upper  limit  of  the  number  of  malieious  partieipants  X.  Let  b,  t,  T,  p  be  ehosen  as  the 
following:  t  :=  [p],  p  :=  O  (ig  a  :=  O 

/mT[ 

and  at  the  same  time,  satisfy  - - 1"  ^  <  (5+. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5+,  every  element  a  that  appears 
l-n  fa  <  Pk  players’  private  input  sets  is  not  identified  as  a  k-threshold  hot  item. 

In  the  HotItem-ID  protoeol,  with  probability  at  least  1  —  5-,  every  element  a  that  appears 
in  fa  >  players’  private  input  sets  is  identified  as  a  k-threshold  hot  item. 

Proof.  The  proof  of  this  theorem  very  closely  follows  the  proof  of  Theorem  26.  □ 


(ig 


<5_ 


b  and  T  are  ehosen  to  minimize  b  x  T, 


B.3  Details  of  One-Show  Tags 

In  this  section,  we  describe  a  one-show  tag  scheme  obtained  through  the  modification  of  the 
group  signature  scheme  of  Boneh  and  Shacham  [7].  One  one-show  tag  is  lightweight,  requiring 
only  1539  bits.  Using  the  same  techniques,  similar  constructions  can  be  obtained  from  other 
group  signature  schemes. 

A  group  signature  scheme  allows  each  member  of  group  of  players  to  sign  messages  on  behalf 
of  the  group.  Given  these  signatures,  no  player  or  coalition  of  players  (except  the  trusted  group 
manager)  can  distinguish  the  player  that  produced  any  signature,  nor  can  they  determine  if 
two  signatures  were  produced  by  the  same  group  member. 

In  the  Boneh/Shacham  group  signature  scheme,  the  group  public  key  is  pk  =  {gi,  g2,w}, 
where  G  Gi,  5(2  G  G2,  and  w  =  g^  ioi  j  ^  Z*.  {p  can  be  taken  to  be  a  170-bit  prime, 
and  the  elements  of  Gi,G2  can  be  represented  in  171  bits  [7].)  The  trusted  group  manager 
holds  the  group  secret  key  7.  Each  user  i  (1  <  i  <  n)  has  a  private  key  Si  =  {Ai,Xi},  where 
Xi  G  Zp,  Ai  G  Gi.  Using  his  private  key,  each  player  may  sign  a  message,  using  a  variant  of  the 
Fiat-Shamir  heuristic  [20],  by  proving  knowledge  of  a  pair  {Ai,Xi}  such  that  =  gi. 

We  may  modify  this  group  signature  scheme  to  include  provably  correct  one-show  values, 
making  each  signature  a  one-show  tag.  Each  user  i  {1  <  i  <n)  may  construct  a  one-show  tag 
for  bucket  j  of  filter  q{l<q<T,l<j<h)  by:  (1)  signing  the  message  q\\j  essentially  as 
in  the  original  signature  scheme;  (2)  computing  two  additional  values  to  enable  the  recipient  to 
compute  the  one-show  value. 

Each  user  generates  pqj  G  G2  by  an  agreed-on  scheme;  we  discuss  this  element  later  in  the 
section.  We  utilize  the  same  bilinear  mapping  e  as  in  the  computation  of  the  main  signature, 
as  well  as  the  intermediate  values  u,  a,  computed  as  intermediate  values  utilized  in  the  main 
signature.  User  i  computes  these  additional  elements  for  a  one-show  tag: 

^3  =  e{v,gqj)''^ 

T4  =  e{v,gqj) 

The  sole  change  to  the  original  signature  scheme  is  that  the  challenge  value  c  is  computed  as 
c  =  II{pk,q  II  j,r,Ti,T2,T3,T4,  Ri,  R2,  R3). 
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The  recipient  conducts  all  the  validity  checks  specified  in  the  Boneh/Shacham  signature 
scheme,  as  well  as  the  following  additional  check,  derived  from  a  proof  of  discrete  logarithm 
equality  [12]: 

e{v,ggjy- =T^T3 

We  define  the  one-show  value  as  e{Ai,ggj)-,  note  that  this  value  cannot  be  constructed  by  any 
player  other  than  i  and  that  player  i  can  construct  exactly  one  such  value.  To  compute  this 
value,  the  signature  recipient  computes: 

{e(T2,gqgye{v,ggj)~^^T3)^ 

The  additional  zero-knowledge  proof  required  for  the  one-show  tag  construction  is  efficient, 
and  thus  our  one-show  tag  construction  and  verification  is  nearly  as  efficient  as  the  original 
group  signature  scheme.  Note  that  these  one-show  tags  are  unlinkable,  anonymous,  and  can  be 
verified  by  all  players  who  hold  the  group  public  key. 

The  parameter  ggj,  used  to  construct  a  one-show  value  associated  with  bucket  j  of  filter  q 
(1  <  <  T,  1  <  j  <  6),  can  be  efficiently  constructed  in  a  variety  of  ways  such  that  no  player 

knows  its  discrete  logarithm.  In  this  section,  we  briefly  describe  one  such  approach. 

Let  PRNG  be  a  pseudo-random  number  generator  with  range  G2  [40].  Let  be  a  hash 

function  that  takes  bit  strings  as  input  and  outputs  data  suitable  for  input  into  PRNG.  Let 
a  =  HpRMcig  II  Q  II  j)-  bet  the  £th  element  output  from  this  PRNG  on  seed  a  be  denoted  a^. 
Each  player  calculates  the  element  ap ,  the  first  generator  of  G2  in  the  sequence  ai,a2,  ■  ■  ■  ■  Set 
9q,j  ~ 
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